Multiplex Preparation of Barcoded Gene Specific DNA Fragments

ABSTRACT

Methods of preparing a plurality of sample-barcoded anchor-domain-flanked gene specific deoxyribonucleic acid (DNA) fragments from a template nucleic acid, e.g., ribonucleic acid (RNA), sample are provided. Aspects of the methods include employing a set of gene specific primer pairs, wherein each pair of gene specific primers is made up of a forward primer and a reverse primer, at least one of which includes a sample barcode domain. The methods find use in a variety of different applications, including high-throughput sequencing, e.g., expression profiling, applications, including of small biological samples, e.g., single-cells.

CROSS-REFERENCE TO RELATED APPLICATIONS

Pursuant to 35 U.S.C. § 119(e), this application claims priority to U.S.Provisional Application Ser. No. 62/765,124 filed on Aug. 17, 2018 andU.S. Provisional Application Ser. No. 62/799,448 filed Jan. 31, 2019,the disclosures of which applications are herein incorporated byreference.

INTRODUCTION

Multiplex polymerase chains reactions (multiplex PCR) include thesimultaneous amplification of many DNA sequences in one reaction.Applications of multiplex PCR include, but are not limited to, theidentification of mutations, gene deletions, and polymorphisms and theproduction or quantitation of amplicons for high throughput sequencingand genotyping. Multiplex reactions may include two or more targetsequences with primer pairs or one template selectively amplified withprimers designed to target specific regions. Additionally, suchreactions may include multiple templates with regions multiplied bymultiple primer pairs.

During multiplex PCR, proper amplification requires optimal conditions.It is important to maintain controlled cycling and annealingtemperatures and fine-tuned relative concentrations of primers, buffers,dNTP's, Taq DNA polymerase, template and other PCR reagents. Commonproblems associated with multiplex PCR include: i) mis-priming due tononspecific primer binding to non-target templates; and ii) theformation of unwanted side products due to the presence of multipleprimer pairs. In conjunction with several other sensitive proceduralvariables, these issues may lead to cross hybridization, and uneven orno amplification of some target sequences. Unwanted multiplex PCR sideproducts form in the presence of multiple primer pairs. These sideproducts may include homodimers, formed by inter-molecular base pairingbetween two similar primers, and heterodimers, formed frominter-molecular interactions between sense and antisense primers.Another undesirable occurrence is the formation of hairpins andfold-back structures from intra-molecular interactions.

Furthermore, the technological problems with design of PCR primers andassay optimization are compounded in the analysis of small nucleic acidsamples from large numbers of biological samples, like small amounts ofclinical samples and in single cells. One of the best strategies ofprocessing large numbers of biological samples is labeling of targetnucleic acids (e.g. mRNAs) present in every sample with uniquesample-specific barcodes, where the barcodes are employed to denote thesource of each particular sample. These sample-specific barcodes can beused for deconvolution of the final multiplex profiling data andassigning these data to specific originating samples. Current RNAbarcode labelling technologies are based on the mixing together ofbiological samples, e.g., from single cells, with universal (notgene-specific) sample-specific barcoded primers (e.g. oligo dT), e.g.,present in a micro well or in a single droplet. The plurality (i.e.,more than one) of barcoded oligo dT primers are commonly synthesized onthe surface of beads using combinatorial phosphoramidite chemistry withone specific barcode labeled oligo dT primer per bead. In anotherstrategy, the barcoded primers with unique barcodes are synthesizedseparately, optionally immobilized on the beads or encapsulated in amatrix (e.g. acrylamide) and deposited in a separate compartment (e.g.,microwell). Each oligo dT primer with a sample-specific barcode (e.g.,attached to the single bead) is annealed to the conservative polyA+ mRNAportion and extended by reverse transcriptase. As a result, the oligodT-extended first strand cDNA molecules derived from one sample arelabeled with a sample-specific barcode, and these cDNA molecules maythen be mixed together with other barcode labeled cDNAs derived fromother samples (e.g., cDNAs derived from 10,000 cells in the current10×Genomics platform), where the resultant pooled mixture may then beamplified as a pool and analyzed for sequence and composition using nextgeneration sequencing (NGS) protocols.

To date, known barcode labelling strategies can only be used inmultiplex targeted PCR assays with a combination of forwardgene-specific primers, which allow one to amplify and analyze only the3′-end of RNA molecules. Unfortunately, non-coding 3′-end portions ofmRNAs are highly promiscuous in different cell types and disease states.As a result, the design of gene-specific primers for the amplificationof 3′-ends of mRNA molecules is highly problematic and there aresignificant obstacles to using this strategy for expression profiling indifferent disease states, e.g., for profiling different cancer celltypes. Furthermore, the barcoded oligo dT primers cannot be directlyapplied for conventional multiplex PCR assay, employing twogene-specific primers (forward and reverse), as the barcoded domain ofthe oligo dT primers are not physically connected with any gene-specificprimer or amplified products. Importantly, multiplex targeted PCR assaycommonly employs set of hundreds-thousands of gene-specific primers.Therefore, it is technologically challenging to label such a pluralityof primers with, e.g., 10,000 different sample-specific barcodes, anddeliver each barcoded primer pool to different samples or cells.

In some instances, the multiplex PCR primers may also incorporate uniquemolecular identifiers (UMI), the highly complex (usually random)sequences which allow one to label each nucleic acid molecule used inthe assay with molecule-specific identifiers. The UMIs are useful inidentification and elimination of PCR duplicate biases introduced duringmultiplex amplification steps.

Although genome-wide analysis of nucleic acid compositions can beachieved by many known technologies, multiplex PCR is a uniquetechnology which can be designed for analysis of hundreds or thousandsof distinct nucleic acid molecules or their fragments. The complexityand composition of a nucleic acid pool which needs to be amplified andanalyzed in such multiplex targeted PCR assays is dependent on specificapplication. As a result, reduction in complexity of analyzed nucleicacids provides dramatic improvement in the sensitivity and specificityof the assay for many critical applications, including single-cellanalyses and clinical diagnostics. Combination of multiplex PCR assayswith sample-specific barcodes is the next frontier in reducing cost andquality of the analysis of multiple biological samples for clinical andexperimental research applications.

SUMMARY

Methods of preparing a plurality of sample-barcodedanchor-domain-flanked gene specific deoxyribonucleic acid (DNA)fragments from a template nucleic acid, e.g., ribonucleic acid (RNA),sample are provided. Aspects of the methods include employing a set ofgene specific primer pairs, wherein each pair of gene specific primersis made up of a forward primer and a reverse primer, at least one ofwhich includes a sample barcode domain. The methods find use in avariety of different applications, including high-throughput sequencing,e.g., expression profiling, applications, including of small biologicalsamples, e.g., single-cells.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 provides a schematic diagram of a method of preparing a pluralityof sample-barcoded anchor-domain-flanked gene specific deoxyribonucleicacid (DNA) fragments in accordance with an embodiment of the invention.

FIG. 2 provides a schematic diagram of a method of preparing a pluralityof sample-barcoded anchor-domain-flanked gene specific DNA fragments inaccordance with another embodiment of the invention.

FIG. 3 provides a schematic diagram of a method of preparing a pluralityof sample-barcoded anchor-domain-flanked gene specific DNA fragmentsusing reverse primers that include a barcode domain in accordance withan embodiment of the invention.

FIG. 4 provides a schematic diagram of a method of preparing a pluralityof sample-barcoded anchor-domain-flanked gene specific DNA fragmentsusing bead linked reverse primers that include a barcode domain inaccordance with an embodiment of the invention.

FIG. 5 provides a schematic diagram of a method of preparing a pluralityof sample-barcoded anchor-domain-flanked gene specific DNA fragmentsusing bead linked sample barcode domain comprising reverse primers in adroplet mediated protocol in accordance with an embodiment of theinvention.

FIG. 6 provides a schematic diagram of a method of preparing a pluralityof sample-barcoded anchor-domain-flanked gene specific deoxyribonucleicacid (DNA) fragments using specific binding member bead linked samplebarcode domain comprising reverse primers in a droplet mediated protocolin accordance with an embodiment of the invention.

FIG. 7 shows Table 1 which provides examples of suitable sequences for18 nt long barcode domains. The sequences in each vertical column fromleft to right are set forth in the following SEQ ID NO's: (i) 85-2104(ii) 2105-4125 (iii) 4126-6106 (iv) 6107-8084.

FIG. 8 shows Table 2 which provides sequences of forward and reversegene specific primers of gene specific primer pairs. The sequences areset forth in the following SEQ ID Nos: Forward: 8085-9444; Reverse:9445-10640.

DEFINITIONS

As used herein, the term “hybridization conditions” means conditions inwhich a primer, or other polynucleotide, specifically hybridizes to aregion of a target nucleic acid with which the primer or otherpolynucleotide shares some complementarity. Whether a primerspecifically hybridizes to a target nucleic acid is determined by suchfactors as the degree of complementarity between the polymer and thetarget nucleic acid and the temperature at which the hybridizationoccurs, which may be informed by the melting temperature (T_(M)) of theprimer. The melting temperature refers to the temperature at which halfof the primer-target nucleic acid duplexes remain hybridized and half ofthe duplexes dissociate into single strands. The Tm of a duplex may beexperimentally determined or predicted using the following formulaTm=81.5+16.6(log 10[Na+])+0.41 (fraction G+C)−(60/N), where N is thechain length and [Na+] is less than 1 M. See Sambrook and Russell (2001;Molecular Cloning: A Laboratory Manual, 3rd ed., Cold Spring HarborPress, Cold Spring Harbor N.Y., Ch. 10). Other more advanced models thatdepend on various parameters may also be used to predict Tm ofprimer/target duplexes depending on various hybridization conditions.Approaches for achieving specific nucleic acid hybridization may befound in, e.g., Tijssen, Laboratory Techniques in Biochemistry andMolecular Biology-Hybridization with Nucleic Acid Probes, part I,chapter 2, “Overview of principles of hybridization and the strategy ofnucleic acid probe assays,” Elsevier (1993).

The terms “complementary” and “complementarity” as used herein refer toa nucleotide sequence that base-pairs by non-covalent bonds to all or aregion of a target nucleic acid (e.g., a region of the product nucleicacid). In the canonical Watson-Crick base pairing, adenine (A) forms abase pair with thymine (T), as does guanine (G) with cytosine (C) inDNA. In RNA, thymine is replaced by uracil (U). As such, A iscomplementary to T and G is complementary to C. In RNA, A iscomplementary to U and vice versa. Typically, “complementary” refers toa nucleotide sequence that is at least partially complementary. The term“complementary” may also encompass duplexes that are fully complementarysuch that every nucleotide in one strand is complementary to everynucleotide in the other strand in corresponding positions. In certaincases, a nucleotide sequence may be partially complementary to a target,in which not all nucleotides are complementary to every nucleotide inthe target nucleic acid in all the corresponding positions. For example,a primer may be perfectly (i.e., 100%) complementary to the targetnucleic acid, or the primer and the target nucleic acid may share somedegree of complementarity which is less than perfect (e.g., 70%, 75%,85%, 90%, 95%, 99%).

The percent identity of two nucleotide sequences can be determined byaligning the sequences for optimal comparison purposes (e.g., gaps canbe introduced in the sequence of a first sequence for optimalalignment). The nucleotides at corresponding positions are thencompared, and the percent identity between the two sequences is afunction of the number of identical positions shared by the sequences(i.e., % identity=#of identical positions/total #of positions×100). Whena position in one sequence is occupied by the same nucleotide as thecorresponding position in the other sequence, then the molecules areidentical at that position. A non-limiting example of such amathematical algorithm is described in Karlin et al., Proc. Natl. Acad.Sci. USA 90:5873-5877 (1993). Such an algorithm is incorporated into theNBLAST and XBLAST programs (version 2.0) as described in Altschul etal., Nucleic Acids Res. 25:389-3402 (1997). When utilizing BLAST andGapped BLAST programs, the default parameters of the respective programs(e.g., NBLAST) can be used. In one aspect, parameters for sequencecomparison can be set at score=100, wordlength=12, or can be varied(e.g., wordlength=5 or wordlength=20).

A “domain” refers to a stretch or length of a nucleic acid made up of aplurality of nucleotides, where the stretch or length provides a definedfunction to the nucleic acid. Examples of domains include BarcodedUnique Molecular Identifier (BUMI) domains, primer binding domains,hybridization domains, barcode domains (such as source barcode domains),unique molecular identifier (UMI) domains, Next Generation Sequencing(NGS) adaptor domains, NGS indexing domains, etc. In some instances, theterms “domain” and “region” may be used interchangeably, including e.g.,where immune receptor chain domains/regions are described, such as e.g.,immune receptor constant domains/regions. While the length of a givendomain may vary, in some instances the length ranges from 2 to 100nucleotides (nt), such as 5 to 50 nt, e.g., 5 to 30 nt.

By “primer extension product composition” is meant a nucleic acidcomposition that includes nucleic acids that are primer extensionproducts. Primer extension products are deoxyribonucleic acids thatinclude a primer domain at the 5′ end covalently bonded to a synthesizeddomain at the 3′ end, which synthesized domain is a domain of baseresidues added by a polymerase mediated reaction to the 3′ end of theprimer domain in a sequence that is dictated by a template nucleic acidto which the primer domain is hybridized during production of the primerextension product. Primer extension product compositions may includedouble stranded nucleic acids that include a template nucleic acidstrand hybridized to a primer extension product strand, e.g., asdescribed above. The length of the primer extension products and/ordouble stranded nucleic acids that incorporate the same in the primerextension product compositions may vary, wherein in some instances thenucleic acids have a length ranging from 50 to 1000 nt, such as 60 to400 nt and including 70 to 250 nt. The number of distinct nucleic acidsthat differ from each other by sequence in the primer extension productcompositions produced via methods of the invention may also vary,ranging in some instances from 10 to 25,000, such as 100 to 20,000 andincluding 1,000 to 10,000, 10,000 to 20,000, 15,000 to 20,000 and 15,000to 19,000.

DETAILED DESCRIPTION

Methods of preparing a plurality of sample-barcodedanchor-domain-flanked gene specific deoxyribonucleic acid (DNA)fragments from a template nucleic acid, e.g., ribonucleic acid (RNA),sample are provided. Aspects of the methods include employing a set ofgene specific primer pairs, wherein each pair of gene specific primersis made up of a forward primer and a reverse primer, at least one ofwhich includes a sample barcode domain. The methods find use in avariety of different applications, including high-throughput sequencing,e.g., expression profiling, applications, including of small biologicalsamples, e.g., single-cells.

Before the present invention is described in greater detail, it is to beunderstood that this invention is not limited to particular embodimentsdescribed, as such may, of course, vary. It is also to be understoodthat the terminology used herein is for the purpose of describingparticular embodiments only, and is not intended to be limiting, sincethe scope of the present invention will be limited only by the appendedclaims.

Where a range of values is provided, it is understood that eachintervening value, to the tenth of the unit of the lower limit unlessthe context clearly dictates otherwise, between the upper and lowerlimit of that range and any other stated or intervening value in thatstated range, is encompassed within the invention. The upper and lowerlimits of these smaller ranges may independently be included in thesmaller ranges and are also encompassed within the invention, subject toany specifically excluded limit in the stated range. Where the statedrange includes one or both of the limits, ranges excluding either orboth of those included limits are also included in the invention.

Certain ranges are presented herein with numerical values being precededby the term “about.” The term “about” is used herein to provide literalsupport for the exact number that it precedes, as well as a number thatis near to or approximately the number that the term precedes. Indetermining whether a number is near to or approximately a specificallyrecited number, the near or approximating unrecited number may be anumber which, in the context in which it is presented, provides thesubstantial equivalent of the specifically recited number.

Unless defined otherwise, all technical and scientific terms used hereinhave the same meaning as commonly understood by one of ordinary skill inthe art to which this invention belongs. Although any methods andmaterials similar or equivalent to those described herein can also beused in the practice or testing of the present invention, representativeillustrative methods and materials are now described.

All publications and patents cited in this specification are hereinincorporated by reference as if each individual publication or patentwere specifically and individually indicated to be incorporated byreference and are incorporated herein by reference to disclose anddescribe the methods and/or materials in connection with which thepublications are cited. The citation of any publication is for itsdisclosure prior to the filing date and should not be construed as anadmission that the present invention is not entitled to antedate suchpublication by virtue of prior invention. Further, the dates ofpublication provided may be different from the actual publication dateswhich may need to be independently confirmed.

It is noted that, as used herein and in the appended claims, thesingular forms “a”, “an”, and “the” include plural referents unless thecontext clearly dictates otherwise. It is further noted that the claimsmay be drafted to exclude any optional element. As such, this statementis intended to serve as antecedent basis for use of such exclusiveterminology as “solely,” “only” and the like in connection with therecitation of claim elements, or use of a “negative” limitation.

As will be apparent to those of skill in the art upon reading thisdisclosure, each of the individual embodiments described and illustratedherein has discrete components and features which may be readilyseparated from or combined with the features of any of the other severalembodiments without departing from the scope or spirit of the presentinvention. Any recited method can be carried out in the order of eventsrecited or in any other order which is logically possible.

While the apparatus and method has or will be described for the sake ofgrammatical fluidity with functional explanations, it is to be expresslyunderstood that the claims, unless expressly formulated under 35 U.S.C.§ 112, are not to be construed as necessarily limited in any way by theconstruction of “means” or “steps” limitations, but are to be accordedthe full scope of the meaning and equivalents of the definition providedby the claims under the judicial doctrine of equivalents, and in thecase where the claims are expressly formulated under 35 U.S.C. § 112 areto be accorded full statutory equivalents under 35 U.S.C. § 112.

In further describing various aspects of the invention, embodiments ofvarious methods will be discussed first in greater detail, followed by areview of various applications in which the methods find use as well askits that find use in various embodiments of the invention.

Methods

As summarized above, methods of preparing a plurality of sample-barcodedanchor-domain-flanked gene specific deoxyribonucleic acid (DNA)fragments from a ribonucleic acid (RNA) sample are provided. Beforedescribing the methods further, the product nucleic acids prepared themethods is now described first in greater detail.

Sample-Barcoded Anchor-Domain-Flanked Gene Specific DNA Fragments

By “sample-barcoded anchor-domain-flanked gene specific deoxyribonucleicacid (DNA) fragment” is meant a DNA that includes an anchor domain oneach side of a gene specific domain. As the gene specific domain isflanked by anchor domains, the DNA fragments prepared by methods of theinvention include a first anchor domain located at a first end of theDNA fragment and a second anchor domain located at a second end of theDNA.

By gene specific domain is meant a region of the dsDNA fragment theincludes a sequence found in template nucleic acid, such as a templatemRNA. While the length of the gene specific domain may vary, in someinstances the gene-specific domain ranges in length from 50 to 500 nt,such as 60 to 300 nt.

In addition to the gene specific domains, as described above, the DNAfragments have anchor domains on either side of the gene specificdomain. Anchor domains are domains that are employed in nucleic acidamplification, such as polymerase chain reaction (PCR), steps of themethods, where they serve as primer binding sites for the primersemployed in such amplification steps. Where the amplification employedis PCR, the anchor domains may also be referred to as PCR primer bindingdomains. The length of the anchor domains may vary, as desired. In someinstances, the anchor domains of each primer pair range in length from10 to 50 nt, such as 10 to 30 nt, e.g., 10 to 24, including 12 to 20 nt.Where desired, the anchor domains may include PCR suppression sequences.PCR suppression sequences are sequences configured to suppress theformation of non-target DNA during PCR amplification reactions, e.g.,via the production of pan-like structures. Such sequences, when present,may vary in length, ranging in some instances from 5 to 25 nt, such as 7to 21, including 7 to 20 nt. PCR suppression sequences of interestinclude, but are not limited to, those sequences described in U.S. Pat.No. 5,565,340; the disclosure of which is herein incorporated byreferences. An example of forward and reverse anchor domains thatinclude PCR suppression sequences are: AGCACCGACCAGCAGACA (SEQ ID NO:01)and AGACACGACCAGCCACGA (SEQ ID NO:02).

As summarized above, the DNA fragments are also “sample-barcoded”, bywhich is meant that they include a barcode domain that denotes, i.e.,indicates or provides information about (such that it may be used todetermine), the specific sample, e.g., cell, from which the fragment hasbeen produced. Barcode domains include unique, specific sequences. Whilethe length of a given barcode domain may vary, in some instances thelength ranges from 6 to 30 nt, such as 8 to 20 nt, and including 12 to18 nt. Examples of suitable sequences for 18 nt long barcode domains areprovided in Table 1 in FIG. 7.

In addition to the gene-specific, barcode and anchor domains, thefragments produced by methods of the invention may further include aunique molecular index (i.e., unique molecular identifier or UMI)domain. UMI domains have sequences configured for labeling of the eachin the plurality of RNA molecules (and extended cDNA product) present inthe hybridization mix with different molecule-specific indexes. UMIdomains are stretches of random or semi-random nucleotides. While thelengths of UMI domains may vary, in some instances the length of a UMIdomain ranges from 8 to 20 nt, which in a given assay provides forcomplexity of different unique sequences of 10,000 or more differentUMIs. In some instances, using at least 10,000 unique indexes issufficient to label each template molecule with the same sequence with aunique index, i.e., UMI. By analyzing the number of the indexes, e.g.,via NGS, the number of each unique template molecule with the samesequence employed in multiplex PCR assay can be calculated. In someinstances, the UMI domain may be combined with the barcode domain, e.g.,where the UMI nucleotides are interspersed with the barcode nucleotidesin a BUMI domain, e.g., as described in United States Patent ApplicationPublication No. US20150072344, the disclosure of which is hereinincorporated by reference.

Also present in the DNA fragments produced by methods of the inventionmay be ligated linker domains. Ligated linker domains are domains havingsequences found in first and second linker domains employed in methodsof the invention, e.g., as described in greater detail below. In theligated linker domain, the sequences of the first and second linkerdomains are joined to each other, such that the sequence of one of thelinker domains begins at the end of the other linker domain. The lengthof the ligated linker domain may vary, and in some instances ranges 15to 60 nt, such as 20 to 50 nt, and including 24 to 40 nt. While notrequired, in some cases the ligated linker domain has a sequence with aGC-content in the range 50% to 80%.

As indicated above, the methods are methods of preparing a plurality ofsample-barcoded anchor-domain-flanked gene specific DNA fragments from atemplate nucleic acid sample, e.g., a template ribonucleic acid(template RNA) sample. More specifically the methods are multiplexmethods of preparing a plurality of sample-barcodedanchor-domain-flanked gene specific deoxyribonucleic acid DNA fragmentsfrom a template nucleic acid, e.g., RNA, sample, such that each DNAfragment of the plurality is produced at the same time from the RNAsample, e.g., each DNA fragment is produced simultaneously from thesource RNA sample. The number of distinct DNA fragments prepared in agiven method may vary, where in some instances the number in theplurality ranges from 10 to 25,000, such as 100 to 20,000 and including1,000 to 10,000, 10,000 to 20,000, 15,000 to 20,000 and 15,000 to19,000.

Among the DNA fragments of the plurality that are produced from a singlesample by methods of the invention, a given DNA fragment is consideredto be distinct from another DNA fragment if the gene-specific domains ofthe two fragments differ from each other by sequence. While thegene-specific domains of the DNA fragments in a given plurality may alldiffer from each other, e.g., because they include coding sequences ofdifferent genes, the DNA fragments will also include common domains,i.e., domains that are identical to each other (i.e., domains havingsequences that do not differ from each other), where these domains arethe flanking anchor domains, the barcode domains and the ligation linkerdomains. When employed, the DNA fragments may further differ withrespect to additional domains, such as distinct UMI domains, such thatthe UMI domains of the DNA fragments have different sequences, i.e.,they are not common or identical.

As indicated above, during a given protocol a plurality of DNA fragmentsproduced from one sample may be combined, i.e., pooled, with one or moreadditional pluralities produced from one or more additional samples. Insuch pooled compositions, each plurality of the pooled composition willhave a distinct barcode domain, such that the barcode domain of a firstplurality of the composition will have a sequence that differs fromevery other barcode domain of every other plurality in the pooledcomposition. In a given pooled composition, each barcode domain has asequence that is significantly different from that of any other barcodedomain in the pooled composition, with a difference of at least 1nucleotide, such as 2 nucleotides and including 3 or more nucleotidedifferences in the whole set of barcodes employed in the assay. In thisway each plurality of the pooled composition will have a distinctidentifying barcode domain. The number of different barcode domains insuch pooled compositions is the same as the number of differentpluralities in the pooled composition, where the number represents thenumber of different samples that is employed to make the pooledcomposition. The number of different barcodes present in a given pooledcomposition depends on number of samples being analyzed in a givenassay. In some instances, the number ranges from 10 to 1,000,000, suchas 100 to 100,000, and including 1,000 to 10,000. For example, currentlyfor analysis of single-cell samples, the number of barcodes may be10,000 or more, but for analysis of clinical samples the number ofbarcodes may not exceed 1,000.

Gene Specific Primers

As summarized above, in embodiments of the invention a set of genespecific primers, i.e., a collection of gene specific primer pairs ofknown sequence, is employed. While the number of primer pairs in a givenset may vary, as desired, in some instances the number of primer pairsin a given set is 10 or more, such as 20 or more, 30 or more, 40 ormore, 50 or more, 60 or more, 70 or more, 80 or more, 90 or more, 100 ormore, 125 or more, 250 or more, 500 or more, including 1000 or more, 200or more, 5000 or more, 8000 or more, 10,000 or more 15,000 or more,18,000 or more and 20,000 or more. In some instances, the number of genespecific primer pairs that is present in the set is 25,000 or less, suchas 20,000 or less. As such, in some embodiments the number of genespecific primer pairs in the set that is employed in the methods rangesfrom 10 to 25,000, such as 50 to 20,000, including 1,000 to 10,000,e.g., 2,500 to 8,500, and 10,000 to 20,000, e.g., 15,000 to 19,000.

Gene specific primer pairs present in a given set of the invention aremade up of a forward primer and a reverse primer, wherein the forwardand reverse primers of each primer pair include gene specific domains,where these gene specific domains may be experimentally validated assuitable for use in a multiplex amplification assay. By “experimentallyvalidated as suitable for use in a multiplex amplification assay” ismeant that the set of primers for each target gene in a given set hasbeen experimentally tested in a multiplex amplification assay, such asdescribed in U.S. patent application Ser. No. 15/133,184 published asUS-2016-0376664-A1 and U.S. application Ser. No. 15/914,895 published asUS-2018-0245164-A1 (the disclosures of which are herein incorporated byreference), and the best performing primer set is selected based onparameters, e.g., one or more functional parameters, e.g., as describedin greater detail below. While the multiplex amplification assayemployed to experimentally validate a set of primers may vary, in someinstances the protocol employed includes a first step of, for eachtarget gene selected from the genome-wide set of human or mouse genes,selecting a region that is conservative for different mRNA isoforms,following which a set of forward and reverse PCR primers which arecomplementary and specific for the selected gene region are designed.The primers may be designed using any convenient algorithm and/orsoftware tool, e.g., such as the Primer3 algorithm, Primer Design Toolfrom NCI, etc. The melting temperature of the selected primers may vary,ranging in some instances from 60° C. to 80° C., such as 65° C. to 80°C. Furthermore, the primers may be selected that lack significantsecondary structures, or self-complementarity (e.g., primers may beselected with less than 4-bp complementary regions) andcross-complementarity to each other of less than 10 nt complementarityregion. The length of the selected PCR primers may vary, and in someinstances ranges from 15 to 25 nt, such as 16 to 24 nt, with GC-contentof between 45% to 85%, such as 50% to 75%. In order to avoidprimer/dimer formation in a multiplex RT-PCR assay, the selected primersin some embodiments are designed with the nucleotide A at the 3′-end andbiased GCA-rich composition with reduced percentage of T nucleotides,where in some instances the percentage of T is 20% or less, such as 15%or less, including 10% or less, down to 0%. Following primer design,homology searching for similar PCR primer binding domain(s) in other RNAspecies (such as available in GeneBank), e.g., via BLAST or Thermo-Blastalgorithm, is performed in order to select primers specific only to thetarget region of interest. Next, the resultant primer set is rankedbased on the distance between primers with the preferred size ofamplicons, e.g., which ranges in some instances between 60 to 250 basepairs (bp). Following this ranking, a set of at least 1 primer pair,such as 3 or more, e.g., 5 or more, up to 12 or more, but in someinstances not exceeding 12 primer pairs, is synthesized and functionallyvalidated in a multiplex Reverse Transcription (RT)-PCR-NGS (nextgeneration sequencing) assay, e.g., using the protocol disclosed in theExperimental section, below. In some embodiments, e.g., those specificfor mutation profiling in clinically actionable or cancer driver genes,a complete set of PCR primers is designed and validated which allows oneto amplify a set of overlapping amplicons that cover the complete mRNAsequence from the 5′ to the 3′-end. Primers present in sets of genespecific primers of the invention may be experimentally validated usingany convenient protocol. In some instances, the experimentally validatedgene specific domains are validated in a multiplex amplification assaywith a synthetic control template mix which mimics the natural targettemplate sequences and includes binding sites for the whole set ofgene-specific primer pairs and/or a universal natural template mixderived from multiple different mammalian tissues or cell types.Specifically, as a template for multiplex RT-PCR assay, a set (usuallybetween 3 to 6) of natural total universal RNAs, e.g., including a mixof several RNAs isolated from human or mouse cell lines or tissuesamples (e.g., available from Takara-Clontech, Agilent, Qiagen, Origene,etc.) may be employed as a natural nucleic acid control. In addition (oralternatively) to the set of the natural control template nucleic acids,a mix of the synthetic control template nucleic acids, e.g., one thathas been synthesized on the surface of custom microarrays (e.g., CustomArray or Agilent) and designed for each target amplicon, may beemployed. In such synthetic control templates, the templates include thesequence of the both PCR primer-binding site domains and the full-lengthor truncated in the middle cDNA region between PCR primers thatcorresponds to the primer extension domain. In some functionalvalidation assays, two synthetic template concentrations (e.g., 10-folddifference) may be employed to measure expression level (number ofspecific reads) in a manner that is not dependent on the amount ofstarting universal RNA template. The length of synthetic controltemplates may vary, ranging in some instances from 100 to 200, such as110 to 180, including 120 to 160 nt. The amplification productsgenerated in the multiplex RT-PCR assays may be quantitatively analyzedby sequence analysis using conventional NGS instruments (e.g., availablefrom Illumina, Thermo-Fisher, Nanopore and other commercial vendors).The NGS data generated for different templates and experimentalconditions may be scaled to the same number of total reads (usuallytotal 10,000,000 reads), aligned with the sequences of PCR primer domainand downstream extended domain sequences for each target amplicon. Thenumber of specific reads corresponding to each target amplicon may bemeasured as the number of correctly aligned sequences for each PCRprimer pair and downstream extended domain sequences. In addition, foreach primer pair, the number of non-specific (off-target) reads for theamplicons may be calculated which has correct the PCR primer domain butdifferent, non-target extended domain sequences. The set of PCR primerpairs designed for each target gene may then be ranked using the set ofcriteria described below. The highest rank PCR primer pair for eachtarget gene is first selected based on the highest number of specificreads (e.g., 100 or more, such as 500 or more and including 1,000specific reads) and minimum number of non-specific reads (e.g., 2-foldless than number of specific reads, but not exceeding 5,000, or such as2,000 reads) measured across all universal RNAs and control synthetictemplate. Next, the highest activity PCR primer set may be selected fromamong other primers that demonstrate a common pattern of expressionamong different natural universal RNAs used in the assay. Common patternof expression between different primers sets indicates that they targetthe same conservative cDNA region, rather than a unique target regionspecific for particular mRNA isoform(s). In some embodiments, human PCRprimers are selected that effectively amplified target regions fromhuman but not from the mouse universal RNAs. In other embodiments, e.g.,those specific for detection of clinically actionable mutations, not onebut a complete set of PCR primers are selected which amplify ampliconsoverlapping the whole mRNA/cDNA sequence. In some embodiments, specificactivity of primers is assayed at thermocycling extension temperaturesof both 60° C. and 65° C. Using these two different conditions enablesthe identification of primer pairs that demonstrate similar (e.g., lessthan 2-fold difference) specific activity across several controltemplates and universal RNAs. In some instances, if a PCR primer setwith high specific activity in both control synthetic template (e.g.,less than 500 reads) and in all universal RNAs (e.g., less than 100reads) for any target gene is not identified, a new candidate PCR primerset for the failed gene(s) is designed and validation protocol repeateduntil a suitable set is found. As a result of functional validationexperiments, one can select at least one PCR primer set for each targetgene of interest that has high sensitivity and selectivity, e.g., for atleast 90%, such as 95% or more target genes of interest. Each pair ofgene specific primers are configured to hybridize to a target nucleicacid sequence for which they are specific at locations that areseparated by a known or predetermined distance, i.e., a templatedistance. The length of the template distance may vary, ranging in someinstances from 50 to 750 bp, such as 60 to 500 bp, including 60 to 300bp, e.g., 70 to 250 bp. As such, the product nucleic acid produced fromthe gene specific primers may have a central domain, i.e., extensiondomain, complementary to the template nucleic acid from which it isproduced (that is, identical to the reverse-complement sequence of thetemplate nucleic acid from which it is produced) that varies in length,ranging in some instances from 50 to 750 nt, such as 60 to 500 nt,including 60 to 400 nt, e.g., 60 to 300 nt, including 80 to 200 nt.

A given gene specific primer may include a multiplex experimentallyvalidated gene specific domain, e.g., as described above. The length ofthe gene specific domain may vary, so long as the domain serves tospecifically hybridize to a target nucleic acid under hybridizationconditions of interest. An example of such hybridization conditions ishybridization at 50° C. or higher and 0.1×SSC (15 mM sodium chloride/1.5mM sodium citrate). In some embodiments, these hybridization conditionsmay be defined by length and nucleotide sequence of the gene-specificdomains of the PCR primers, composition of PCR buffer, properties of DNApolymerase and conditions used in the primer extension step.Furthermore, hybridization conditions could be compatible with primerextension conditions, e.g., where both hybridization and extension stepperformed in DNA polymerase reaction buffer (e.g. 1×HF or 1×GC bufferfrom Thermo-Fisher). In another embodiment, where the hybridization stepand primer extension step are separate steps in the protocol, thehybridization buffers could contain additional components for increasinghybridization rate (e.g., CTAB, PEG, high salt concentration (e.g., 1Mor more), etc.), lysing the cells, denaturing proteins, stabilizing RNA,protein (detergents, guanidium salts, urea, PMSF, 2-mercapoethanol,etc.). If reverse barcoded primers are hybridized with target templateRNA composition in cell extracts (e.g., single cells), the hybridizationbuffer could be optimized by highly denaturing conditions to stabilizethe RNA. Furthermore, the reverse barcoded primers may be longer (suchas 25-80 nt, such as 30 to 70 nt) than regular PCR primers to facilitatehighly specific and stable interaction with target RNAs at elevatedtemperatures (e.g., 50° C. to 80° C.). Furthermore, the use of stringenthybridization conditions and the removal of non-binding primers permitsthe formation of specific complexes between reverse gene specificprimers and target RNAs. Therefore, the specificity of extension ofreverse barcoded primers could be significantly defined by hybridizationrather than follow-up primer extension step specificity. Primerextensions temperatures may vary, ranging in some instances from 50 to75, such as 60 to 72° C. As disclosed in Experimental section below, inone of the embodiments a primer extension step is employed in whichextension occurs between 60 and 65° C. using Phusion II DNA polymerase,HF or GC buffer reagents available from Thermo-Fisher. Both the lengthand the specific nucleotide sequence of the PCR primers define thehybridization condition at the primer extension step. In someembodiments, the length and specific sequence of the gene specificdomains of the PCR primers is selected in order to provide efficientbinding and extension at 60 and 65° C. under the PCR conditions used inthe primer extension step. Such conditions may provide or highefficiency and specificity of the primer extension in PCR reactionconditions. In some embodiments, the primer length and sequence may beadjusted to perform an extension step at 68° C. or even 72° C.

To control efficiency and specificity of primer extension step, thelength of the gene specific domain of the forward and reverse primersmay vary. In some instances, the length ranges from 10 to 80 nt, such as15 to 75 nt, e.g., 10 to 50 nt, such as 10 to 30 nt, including 14 to 22nt or 16 to 24 nt. The gene specific domain of the forward and reverseprimers may vary length. In some instances, the gene specific domain ofthe forward domains is shorter than the gene specific domain of thereverse primers. For example, in some instance the length of the genespecific domain in the forward primers ranges from 15 to 30 nt, such as18 to 25 nt, while the length of the gene specific domain in the reverseprimers ranges from 25 to 80 nt, such as 30 to 70 nt, including 30 to 50nt. Each primer of the gene specific primer set may include only a genespecific domain, or may include one or more additional domains asdesired, e.g., anchor domains, NGS adaptor domains, labels or labeldomains, etc., e.g., as described below. In some embodiments whereadditional domains are present, each primer pair may include primersranging in length from 10 to 150 nt, such as 10 to 100 nt, including 10to 75 nt, such as from 15 to 60 nt, including from 24 to 45 nt.

Where desired, the gene-specific primer domain of each primer is GCA-and/or GCT-rich. By GCA- and/or GCT-rich is meant that the gene-specificprimer domain has a substantial portion of G, C, A- and/or G, C, Tnucleotides. While the number of such nucleotides in a gene specificprimer domain may vary, in some instance the number of such sequencesranges from 75% to 100%, such as 85% to 100%. As the gene specificprimer domains of such embodiments are GCA- and/or GCT-rich, the GCcontent of the gene specific primer domains is also high. While the GCcontent may vary, in some instances the GC content ranges from 40 to90%, such as 45 to 85%, including 50 to 85%, e.g., 50 to 80%.

Depending on the specific application for which the set is configured,the set of gene specific primers may be configured to target a widerange of mammalian genes, and pathogenic genes from a wide range ofpathogenic organisms, such as viruses, bacteria, fungi, etc. which couldbe present in the human or mammalian bodies. Of interest in certainapplications are human, mammalian species commonly used as a modelorganisms to study human diseases, such as mouse, rat, or monkey, andpathogenic organisms associated with human diseases. To be analyzed inaccordance with embodiments of the invention, the targeted genes may bepresent in the mammalian cells or fluids. In some embodiments, thetargeted genes are may be protein coding, or may express non-codingRNAs, micro RNAs, mitochondrial RNAs, regulatory RNAs, etc. In someinstances, the set of genes selected is genome-wide, such that it coversall genes present in the genome of an organism. In other embodiments,the genes are selected from the genes that could be transcribed orexpressed in the organism and present in the biological samples in theform of RNA. The genome-wide set of genes specific for human, model andpathogenic organisms is of special interest in some instances and may beused to develop a set of genome-wide targeted RNA expression assaysbased on the disclosed multiplex PCR assay. Genome-wide sets of PCRprimers may vary in number, and in some instances are configured toassay 18,000 or more, such as 20,000 or more and 25,000 or more, such as30,000 or more genes. Additional sets of PCR primers may be configuredbased on a genome-wide set of genes from a wide range of viral,bacterial and eukaryotic pathogenic organisms. In another embodiment,the set of gene specific primers may be configured to produce primerextension products from a subset of specific genes selected from thegenome-wide set of genes. One of these subsets is the set of cancerassociated genes, that is, the genes that have been shown to beassociated with initiation, development, diagnostic, treatment ofcancer. Such genes could be implicated in, or be diagnostic of, orotherwise of interest in, the study and/or treatment of cancer, i.e.,any of various malignant neoplasms characterized by the proliferation ofanaplastic cells that tend to invade surrounding tissue and metastasizeto new body sites. As such, cancer associated genes that may berepresented in a given set of gene specific primers include, but are notlimited to: cancer hallmark genes, pan-cancer driver genes, pathway andsignaling network genes, drug metabolism genes, extracellular proteomegenes, drug target genes (including those of FDA approved and/orclinical trial targets), cell lineage genes, immunity mechanisms &immunotherapy markers, immunotherapy drug target genes, knownbiomarkers, epigenetics genes, etc.

In another embodiment, the subset of the cancer associated genes isemployed in developing of Cancer Clinically Actionable 26 assay forprofiling all clinically actionable mutations in the set of 26 humangenes (ABL1, AKT1, ALK, BRAF, CDK4, CDK6, CDKN2A, EGFR, ERBB2, FGFR1,FGFR2, FLT3, KDR, KIT, KRA, MET, NRAS, PDGFRA, PIK3CA, PIK3R1, PTCH1,PTEN, PTPN11, RET, ROS1, SMO). This assay includes the additional set ofmultiplex PCR primers designed and validated to amplify the set ofoverlapping amplicons that cover the whole mRNA sequence of the targetgenes.

In another embodiment, the multiplex PCR assay is designed for analysisof a subset of cell-specific, tissue-specific or state-specific genes.These genes encode marker products (e.g. proteins, peptides or RNAs)that are specifically expressed in different cell types (e.g. markersfor T, B, NK, stromal, cancer, epithelial, neuronal, etc. cells),different tissues or different cell states, e.g. marker products inducedby treatment (e.g. drugs) or changes in conditions (e.g. heat shock),disease states (e.g. cancer, infection), or natural biological processes(e.g. differentiation, apoptosis, aging, etc.). The development ofmultiplex PCR assay, based on the set of gene-specific primers specificfor the marker genes, may be employed in the development of prognosticand predictive clinical diagnostic tools, profiling different cell typesand their phenotypes in normal and disease states. For a marker geneanalysis assay, the set of gene specific primer pairs includes primersconfigured to produce primer extension products for 10 or more geneslisted in Table 2 (FIG. 8). As such, the set of gene specific primersemployed in a given method may represent at least some of the geneslisted in Table 2 (FIG. 8), such that the set may include primer pairsthat correspond to at least some of the genes listed in Table 2 (FIG.8).

A primer pair is considered to correspond to a given gene if the primersof the pair specifically hybridize to sequences of the gene. It isunderstood based on the current prior-art knowledge, the selected primerpair sequences could include all or only portion of the primer sequences(e.g. disclosed in the Table 2 (FIG. 8)), so long as they provide forthe desired gene specificity. Modification in the specific sequences ofthe PCR primers, such as mutations, deletion, extensions, usingnucleotide analogs, etc., may be present so long as the functionality ofthe primers in the primer extension step is maintained. The number ofgenes from Table 2 (FIG. 8) represented in the set of gene specificprimers may vary, ranging from 10 to 10,000, including 25 to 10,000, 50to 10,000, 100 to 10,000, where in some instances the number is 150 ormore, such as 200 or more, 250 or more, 500 or more, 1,000 or more, upto and including all of the genes listed in Table 2. In some instances,the set of gene specific primers includes primer pairs having genespecific sequences listed in Table 2 (FIG. 8). The number of genespecific primer pairs having gene specific sequences listed in Table 2(FIG. 8) that may be present in a given set of gene specific primers mayvary, where in some instances the number ranges from 10 to 10,000,including 25 to 10,000, 50 to 10,000, 100 to 10,000, where in someinstances the number is 150 or more, such as 200 or more, 250 or more,500 or more, 1,000 or more, up to and including all of the primer pairslisted in Table 2. Subsets of the genes listed in Table 2 that may beemployed in a given assay may vary. Specific subsets of interest thatmay be employed in a given assay include but are not limited to:cell-specific markers, disease-specific markers, tissue-specificmarkers, or any specific set of genes selected based on specificfunctions, expression, or association with human diseases, and the like.Essentially any combination of primers, including all the primers,identified by the sequence identifiers provided in Table 2 (FIG. 8) maybe assembled to form a set or subset of primer pairs of the presentdisclosure.

Sets and subsets of primer pairs may be configured to include or excludemultiple primer pairs for a particular gene. For example, a set orsubset of primer pairs may include no or essentially no two or moreprimer pairs that target the same gene. Correspondingly, a set or subsetof primer pairs may include two or more different primer pairs thattarget the same gene. Where two or more primer pairs for a particulargene are included in a set or subset, the primer pairs may or may notshare the same forward primer or the same reverse primer. For example,in some instances, two primer pairs for a single gene may include thesame forward primer but have different reverse primers, the same reverseprimer but have different forward primers, or have different forwardprimers and different reverse primers.

As described above, in some instances, a set or subset of primer pairsmay be configured such that no two primer pairs target the same gene,i.e., there is only one primer pair for each gene included in the set orsubset. In some instances, the number of different primer pairstargeting the same gene may be low, including but not limited to e.g.,10 or less primer pairs targeting each gene of the set or subset, suchas 5 or less, 4 or less, 3 or less or no more than 2 primer pairstargeting each gene of the genes of a set or subset. In some instances,10% or less of the genes of a set or subset may be targeted by more thanone primer pair, including 8% or less, 7% or less, 5% or less, 3% orless, 2% or less and 1% or less.

In some instances, the methods include selecting the set of genespecific primers from a provided master library of gene specificprimers, e.g., choosing a subset of primer pairs from an initialcollection of primer pairs. For example, the methods may includeselecting a subset of primer pairs (and thereby identify the primerpairs of a set of gene specific primers to be employed in methods of theinvention, such as described above) that correspond to genes from Table2, where the number of primer pairs in the selected subset may vary,ranging in some instances from 10 to 10,000, including 25 to 10,000, 50to 10,000, 100 to 10,000, where in some instances the number is 150 ormore, such as 200 or more, 250 or more, 500 or more, 1,000 or more.

The disparate primer pairs of a given set are present in substantiallythe same, if not the same amount. As such, in some instances, the copynumber of any given primer pair in a set does not vary from the copynumber of any other primer pair of the set by a value of 100% or less,such as 50% or less. A given primer pair may be present in a set in anydesired amount, where in some instances the amount ranges from 1% to1000%, such as 5% to 500%, or 10% to 500%. The final concentration ofeach primer in the primer extension step may vary, and in some instancesranges from 0.01 to 50 nM, such as 0.01 to 20 nM, or 0.01 to 10 nM,where examples of specific concentrations of interest include 0.01 nM,0.1 nM, 1 nM, 2 nM, 5 nM, 10 nM, 20 nM and 50 nM.

The sets of gene specific primers, e.g., as described above, arenon-naturally occurring compositions. In some instances, the sets ofgene specific primers include domains or regions that are not naturallyoccurring sequences and/or are not naturally joined to the gene specificprimer domains in naturally occurring nucleic acids. For example, thegene specific domains may be joined to one or more synthetic domains,e.g., universal primer binding site domains, indexing domains, barcodedomains, adaptor domains, anchor domain, linker domain, etc. In someinstance the gene specific primers may include one or more moieties thatare not present in naturally occurring nucleic acids, e.g., labelmoieties (e.g., directly detectable labels, such as fluorescent labels,indirectly detectable labels, e.g., components of a signal producingsystem, etc.), non-naturally occurring nucleotides, etc.

Depending on the particular protocol, the forward and reverse primers ofthe pairs of the sets of gene specific primers may be used together orseparately. As such, a given method may include using the forward andreverse primer subsets of a given set separately, such as in methodswhere a template composition is first contacted with a subset of thereverse primers of a set, and then contacted with a subset of theforward primers of the set. Alternatively, a given method may includecontacting a template both the forward and reverse primer subsets of agiven set of gene specific primers at the same time.

The methods of the invention are characterized by employing a set ofgene specific primer pairs, wherein each pair of gene specific primersis made up of a forward primer and a reverse primer and at least one ofthe forward and reverse primers includes, at some time during themethod, a sample barcode domain. In some instances, the methods arecharacterized by employing: a sample-barcoded donor nucleic acid thatincludes an anchor domain and a sample barcode domain, wherein thesample-barcoded donor nucleic acid is employed in conjunction with a setof gene specific primer pairs, wherein each pair of gene specificprimers is made up of a forward primer and a reverse primer and themethods include transferring at least the sample barcode domain to oneof the forward and reverse primers. In other instances, the methods arecharacterized by employing a set of gene specific primers in which thereverse primers include a sample barcode domain.

Sample-Barcoded Donor Nucleic Acid Mediated Protocols

A sample-barcoded donor nucleic acid is an initial nucleic acid fromwhich the sample barcoded domain of the DNA fragments is derived. Inother words, the sample-barcoded donor nucleic acid serves as the sourceof the barcode domain of the final sample barcoded DNA fragments. Sincethe sample-barcoded donor nucleic acid serves as the source of thesample barcode domain, the sample barcode domain does not need to beincorporated into any of the gene specific primers employed to producegene specific fragments from a template nucleic acid sample. Asample-barcoded donor nucleic acid includes a sample barcode domain andan anchor domain, e.g., as described above. In addition, a givesample-barcoded donor nucleic acid may include a linker domain and/or aUMI domain, e.g., as described above.

In some instances, the donor nucleic acid includes a capture domain. Insome such instances, the sample-barcoded donor nucleic acid includes atemplate, e.g., an RNA, template-binding or capture domain, an anchordomain, and a barcode domain, and optionally a linker domain and/or aUMI domain. The template-binding domain is the sequence necessary forbinding barcoded oligonucleotide to template, like DNA or RNA. Examplesof template-binding domains include but are not limited to oligo dTsequences, e.g., for binding to polyA tails of mRNA molecules withnumber of dT nucleotides between 15 to 35, random oligonucleotides witha length of randomly synthesized A, T, G or C between 6 to 30nucleotides, or semi-random oligonucleotides with length between 6 to 18nucleotides designed against conservative regions in target templates,e.g. stretches of the nucleotides coding triplets for the most abundantamino acids, splicing sites, etc. Where the target RNA template moleculeis mRNA, the RNA binding domain may be a poly dT sequence thathybridizes to the mRNA polyA tail. The length of the RNA binding domainmay vary, and in some instances ranges from 10 to 40 nt, such as 15 to35 nt, including 20 to 30 nt. The linker domain is a domain having asequence configured for binding and/or ligation with linker domainpresent in another component of the assay, such as a gene specificprimer, e.g., as described below. While the length of a given linkerdomain may vary, in some instances the length ranges from 5 to 30 nt,such as 10 to 25 nt, including 12 to 20 nt. There are no specialrequirements for nucleotide composition or sequence of the linkerdomain, but in some instances the linker domain is selected withGC-content in the range 50% to 80% without significant secondarystructure within the domain or with other domains present in theoligonucleotide. In addition to the RNA binding domain, the anchordomain, barcode domain and linker domain, the donor nucleic acid mayalso include a UMI domain, e.g., as described above. In such instances,a given assay may employ a plurality of donor nucleic acids that havecommon RNA binding, anchor, barcode and linker domains but distinct UMIdomains that differ from each other in terms of sequence. In suchinstances, the number of different donor nucleic acids that differ fromeach other in terms of their UMI domains, and in some instances solelyin terms of their UMI domains, employed with a given RNA sample mayvary, and in some instances may range from 1,000 to 20,000, such as5,000 to 10,000.

The donor nucleic acids employed in methods of the invention may be insolution or bound to the surface of a solid phase, as desired. Whenbound to a surface of a solid phase (i.e., solid support), the donornucleic acids may be covalently bound or non-covalently bound. The solidphase may vary, where examples of solid phases include, but are notlimited to, beads, wells, plates, etc., e.g., made of a suitable solidphase material, such as a polymeric material, where the surface isconfigured to provide the desired bond to the donor nucleic acids.

In the sample barcoded donor nucleic acids employed in methods of theinvention, the order of the different domains may vary. Accordingly, insome embodiments a sample-barcoded donor nucleic acid may have thestructure: 3′-linker 1-sample barcode domain-anchor 2 domain-RNA bindingdomain-5′. In yet other embodiments, the sample-barcoded donor nucleicacid comprises the structure: 3′-RNA binding domain-anchor 1domain-sample barcode domain-linker 1 domain-5′. These various donornucleic acids and the protocols in which they find use are furtherdescribed below.

With respect to donor nucleic acid mediated embodiments, in someinstances, additional domains present among a given pair of forward andreverse gene specific primers are a linker and anchor domains. In someinstances, the forward gene specific primer includes an anchor domainand the reverse gene specific primer includes a linker domain. In suchinstances, the forward primers may have the structure 5′-anchor1-forward gene specific primer (GSP) domain-3′ and the reverse primershave the structure: 3′-reverse GSP domain-linker 2-5′. In otherinstances, the forward gene specific primer includes a linker domain andthe reverse gene specific primer includes an anchor domain. In suchinstances, the forward primers may have the structure: 3′-forward GSPdomain-linker 2-5′; and the reverse primers may have the structure5′-anchor 2-reverse GSP domain-3′.

In some instances, a reverse GSP mediated barcoding protocol isemployed. In these embodiments, a sample-specific barcode is ligated toa plurality of different gene-specific primers hybridized to templatenucleic acids, e.g., template RNAs, such as template mRNAs, where theGSPs may or may not be extended in a first round primer extensionreaction, such as a first round of cDNA synthesis (i.e., a reversetranscription), at the time of ligation. As a result, the reverse GSPsand/or their primer extension, e.g., first strand cDNA, products includea sample-specific barcode domain which originates from the donor nucleicacid. In other words, the sample-specific barcodes are transferred froman initial sample barcoded donor nucleic acids to reverse gene-specificprimers that are hybridized to template nucleic acids, such as RNAtemplate molecules, e.g., mRNA. Accordingly, these embodiments may becharacterized in that the reverse GSPs are employed in the first strandsynthesis step and the sample barcode domain and anchor domain of thedonor nucleic acid is ligated to the reverse GSPs such that thesedomains are incorporated into the first strand synthesis products. Aschematic illustration of an embodiment of this protocol is illustratedin FIG. 1.

In the embodiment illustrated in FIG. 1, a sample barcode initiallypresent in a donor nucleic acid is transferred to a reverse GSP duringfirst strand cDNA synthesis from an mRNA template. As shown in FIG. 1,the sample-barcoded donor nucleic acid has the structure: 3′-linker1-sample barcode domain-anchor 2 domain-RNA binding domain-5′. The 5′end of the donor nucleic acid is bonded to a solid support, andspecifically a bead. Also, as illustrated, the reverse GSPs include thestructure: 3′-reverse GSP domain-linker 2-5′. In the first step of theprotocol illustrated in FIG. 1, the donor nucleic acids and Rev GSPs arecombined with the sample ribonucleic acids, e.g., an mRNA sampleobtained from a single cell, under hybridization conditions such thatthe donor nucleic acids bind to the polyA tails of the mRNAs and theGSPs bind to their complementary domains of the mRNAs. Where desired,the resultant mRNA template/GSP/RT Primer complexes may be purified fromany excess of unbound gene-specific primers and donor nucleic acids. Inthe illustrated embodiment shown in FIG. 1, the resultant complexes areimmobilized on beads and can be purified from any excess ofnon-hybridized gene-specific primers using a convenient washingprotocol, such as washing protocols known in the art. Alternatively, anyexcess of unbound oligonucleotides may be removed using otherpurification protocols, such as but not limited to: nuclease treatment,chromatography, size-dependent binding to specific matrix, e.g. AMPurebeads, etc.

Following hybridization of the GSPs and the donor nucleic acids to themRNAs of the sample, the linker 1 and linker 2 domains of hybridizedsample-barcoded donor nucleic acids and GSPs are ligated to producesample-barcoded reverse primers, e.g., as illustrated in FIG. 1. Toprovide for sufficient proximity of the ends of the linker domains, aligation linker may be employed. The ligation linker is anoligonucleotide that includes a first domain complementary to the linker1 domain of the donor nucleic acid and a second domain complementary tothe linker 2 domain of the donor nucleic acid. When employed, the lengthof the ligated linker may vary, and in some instances ranges 15 to 60nt, such as 20 to 50 nt, and including 24 to 40 nt. While not required,in some cases the ligation linker domain has a sequence with aGC-content in the range 50% to 80%. The linker 1 and linker 2 domainsmay be ligated to each other using any convenient DNA ligase, where DNAligases that may be employed include, but are not limited to: DNAligases, e.g., Ampligase® Thermostable DNA Ligase, CircLigase-DNALigase, E. coli DNA ligase, T3 DNA ligase, T4 DNA ligase, T7 DNA ligase,Taq DNA ligase, and the like.

In the embodiment illustrated in FIG. 1, following ligation to producesample-barcoded reverse GSPs, the methods include reverse transcribingfirst strand cDNA molecules from the sample-barcoded reverse primers.Reverse transcription may be accomplished by contacting the templateRNA/sample-barcoded reverse GSP complexes with a reverse transcriptaseunder reverse transcription conditions, e.g., as described in greaterdetail below. Reverse transcription results in the production of apopulation of GSP primer extension products having a sample barcodedomain and an anchor domain at their 5′ ends. Where desired, the donornucleic acids may include a linker domain at the 3′-end that providesfor ligation to the linker of the gene-specific primer but that cannotbe extended by a reverse transcriptase, e.g., during the first strandcDNA synthesis step.

Following production of the population GSP primer extension products,e.g., first strand cDNA molecules, as well as any desired purificationor enrichment, e.g., to remove unhybridized GSPs, the resultant firststrand cDNA molecules may be contacted with a population of forwardGSPs, where the forward GSPs have an anchor domain at their 5′ ends anda gene-specific domain at their 3′ ends. Contact occurs under polymerasemediated primer extension reaction conditions, e.g., as described infurther detail below, to produce forward GSP primed primer extensionproducts, e.g., forward GSP primed second strand cDNA molecules. Theresultant primer extension products may then be amplified, e.g., usinguniversal primers that bind to the anchor domains, which results inproduction of a plurality of sample-barcoded anchor-domain-flankeddouble-stranded gene specific DNA fragments, e.g., as illustrated inFIG. 1. The resultant population of sample-barcodedanchor-domain-flanked double-stranded gene-specific DNA fragments may befurther processed, e.g., further amplified, e.g., to add sequencingadaptors, etc., such as described in greater detail below.

In a variation of the protocol illustrated in FIG. 1, the sample barcodedomain that is ligated to GSPs is not initially present in a donornucleic acid that includes a capture domain, such as a RNA bindingdomain, but instead in a sample barcode donor nucleic acid which lacks acapture, e.g., RNA, binding domain. In such instances the donor nucleicacid may still include an anchor domain positioned 5′ of the samplebarcode domain. In such instances, the sample barcode donor nucleic acidmay be ligated to the GSPs at any convenient time, e.g., followinghybridization of the GSPs to the template RNA, prior to GSPhybridization to template RNA, etc.

In another embodiment where a donor nucleic acid is employed, a circularnucleic acid intermediate molecule is produced to transfer a samplebarcode and anchor domain to a gene-specific primer. In examples of thisembodiment, a forward GSP primed primer extension product, i.e., forwardGSP primed second strand cDNA molecule, is circularized to transfer asample barcode domain from an initial sample barcoded donor nucleic acidto the forward GSP, e.g., as illustrated in FIG. 2. As shown in theprotocol illustrated in FIG. 2, the sample-barcoded donor nucleic acidcomprises the structure: 3′-RNA binding domain-anchor 1 domain-samplebarcode domain-linker 1 domain-5′. The RNA-binding domain may vary asdesired, where examples of such domains include oligo dT domains, randomsequence domains, and semi-random sequence domains which may beconfigured to interact and bind to conservative or common regions inRNA, e.g. polyA, short random sequences (e.g., N6) or short conservativesequences (e.g., nucleotide triplets coding the most abundant aminoacids). The anchor 1, sample barcode and linker 1 domains may be asdescribed above. Also, as described above, the donor nucleic acid mayinclude a UMI domain, e.g., linked to the barcode domain, such asdescribed above.

As illustrated in FIG. 2, the donor nucleic acid is employed to primefirst strand cDNA synthesis, e.g., by contacting the RNA sample with thesample-barcoded donor nucleic acid under conditions sufficient toreverse transcribe first strand cDNA molecules from the RNA source,e.g., as reviewed in greater detail below. The resultant first strandcDNA molecules (i.e., donor nucleic acid primed primer extensionproducts) are then contacted with a population of forward GSPs, wherethe forward GSPs include a linker domain and have the structure:3′-forward GSP domain-linker 2-5′. Contact of the first strand cDNAmolecule with the forward GSPs occurs under polymerase mediated primerextension reaction conditions sufficient to produce second strand cDNAmolecules comprising a 5′ linker 2 domain and a 3′ linker 1 domainflanking a forward GSP primed domain, e.g., as illustrated in FIG. 2.

Following forward GSP mediated second strand cDNA synthesis, theflanking linker 1 and linker 2 domains are ligated to produce a circularintermediate, e.g., as illustrated in FIG. 2. To provide for sufficientproximity of the ends of the linker domains, a ligation linker may beemployed. As reviewed above, the ligation linker is an oligonucleotidethat includes a first domain complementary to the linker 1 domain of theRT primer and a second domain complementary to the linker 2 domain ofthe donor nucleic acid. When employed, the length of the ligated linkermay vary, and in some instances ranges 15 to 60 nt, such as 20 to 50 nt,and including 24 to 40 nt. While not required, in some cases theligation linker domain has a sequence with a GC-content in the range 50%to 80%. The linker 1 and linker 2 domains may be ligated to each otherusing any convenient DNA ligase, where DNA ligases that may be employedinclude, but are not limited to: DNA ligases, e.g., Ampligase®Thermostable DNA Ligase, CircLigase-DNA Ligase, E. coli DNA ligase, T3DNA ligase, T4 DNA ligase, T7 DNA ligase, Taq DNA ligase, RNA ligase,and the like. As result of this ligation step and consequentcircularization of the second strand cDNA molecule, the anchor andsample barcode domains from the initial donor nucleic acid are ligated5′ of the forward GSP. The resultant circularized second strand cDNAmolecules are then contacted with the reverse GSPs under primerextension reaction conditions, e.g., as described in greater detailbelow. The reverse GSPs have the structure: 5′-anchor 2-reverse GSPdomain-3′ and contact under primer extension reaction conditionsproduces reverse GSP primed primer extensions products that include ananchor domain at their 5′ ends. The resultant primer extension productsmay then be amplified, e.g., using universal primers that bind to theanchor domains, which results in production of a plurality ofsample-barcoded anchor-domain-flanked double-stranded gene specific DNAfragments, e.g., as illustrated in FIG. 2. The resultant population ofsample-barcoded anchor-domain-flanked double-stranded gene-specific DNAfragments may be further processed, e.g., further amplified, e.g., toadd sequencing adaptors, etc., such as described in greater detailbelow.

Sample Barcoded GSP Mediated Protocols

As summarized above, embodiments of the methods employ gene specificprimers that include a sample barcode domain. Specifically, in suchembodiments at least one of the forward or reverse primers of the set ofgene specific primers includes a sample barcode domain. In someinstances, the reverse gene specific primers of the set, i.e., thereverse gene specific primer subset, include a sample barcode domain. Insome instances, the forward gene specific primers of the set, i.e., theforward gene specific primer subset, include a sample barcode domain.Within the a given subset, while the gene specific domain will vary, thesample barcode domain is the same. In some instances, both the forwardgene specific primers and the reverse gene specific primers include abarcode domain. In some instances, the barcode domain is 5′ of the genespecific domain. In some instances where an anchor domain is alsopresent, the anchor domain is 5′ of the sample barcode domain. Otherdomains such as described above, e.g., UMI domain, etc., may also bepresent as desired. The length of the selected PCR primers may vary, insome embodiments the reverse barcoded primer is longer than forwardprimer. Reverse barcoded primer involving in stringent hybridizationstep with target RNA template composition could have the size in therange between 16 to 120 nt, such as 20 to 80 nt, including 25 to 60 nt,and 30 to 50 nt. Forward gene specific primers in some instances rangesfrom 16 to 25 nt, such as 18 to 24 nt.

The sample barcode containing gene specific primers employed in theseembodiments may be prepared using any convenient protocol, includingnucleic acid synthesis protocols, which protocols may or may not includea ligation step, e.g., as described in the working exemplificationbelow. In some embodiments, the sample specific barcodes are attached toset of gene specific primers via a ligation reaction, such as mediatedby DNA ligase activity. In ligation reactions, two oligonucleotides areligated to each other using enzymes having ligation activity, such asDNA-ligases, Circligase, RNA-ligases, etc. The sample-specific barcodesmay be ligated directly to the gene-specific primers using single-strandligases, e.g., Circligase, RNA ligase, etc. In another embodiment, thesample barcodes may be ligated to the gene-specific primers viaformation of double-stranded intermediate products and DNA ligases likeT4 DNA ligase, Tth DNA ligase, Taq DNA ligase, etc., may be employed.The double-stranded intermediate products may be formed by usingoligonucleotides complementary to the 5′-end of gene-specific primersand 3′-end of barcoded oligonucleotides. In order to simplify theligation reaction composition, specific sequences may be provided at theboth 5′-ends and 3′-ends of ligated molecules, e.g., linker domainsdisclosed in the current application. If both gene-specific primers andbarcoded oligonucleotides have linker domains, adding oligonucleotidecomplementary to both linker domains will form double-strandedintermediates. These double-stranded intermediates with nick between twolinker domains may then be ligated by DNA ligase. The linker domains ofgene-specific primers may be phosphorylated. The labelling ofgene-specific primers with sample-specific barcodes could be performedunder gene-specific primer extension conditions, including addingreagents necessary for ligase activity, e.g. NAD, ATP, etc. In anotherembodiment, the barcode labeling conditions could be different fromgene-specific primer extension conditions and use reaction buffercompositions specific for a particular DNA or RNA ligase. If alloligonucleotides are present in solution (e.g., in the wells ofmicrotiter plates), the ligase-mediated barcoding reaction could beeasily scaled-up for labeling sets of thousands of gene specific primerswith hundreds of different unique sample-specific barcodes (in aseparate wells).

Significantly larger scale (up to several millions) sample barcodedprimers may be achieved if barcoded oligonucleotides are attached to thebead surface. Synthesis of barcoded oligonucleotides on the bead surfaceusing combinatorial chemistry (pool and split synthesis approach) iswell known in art technology and commonly applied for single-cellanalysis. For the most single-cell RNA expression profiling assaysdeveloped so far, the oligonucleotides with structure:5′-Anchor-Barcode-oligo dT-3′ are synthesized on the beads, wherein thebarcode is a combinatorial barcode unique for each bead and the oligo dTdomain is a universal sequence used to prime cDNA synthesis from polyAtail of all mRNAs present in biological sample. The current inventionuniquely allows the combination of known-in-art combinatorial bead-basedbarcode synthesis technology with DNA ligation assay for barcodelabelling and immobilization of thousands of gene specific primers foreach bead. The barcoded gene specific primers immobilized on the beadswill be a unique experimental tool for high-throughput single-celltargeted expression profiling of hundreds-thousands of gene targets. Asopposed to genome-wide barcode labeling of cDNAs using oligo dT-barcodestrategy, the targeted labeling of subset of target transcripts addressthe main limitation (cost, throughput, quality of data) of currentsingle-cell analysis technologies.

In another pool-split chemical synthesis embodiment, the thousands ofgene-specific primers are synthesized on bead surface using conventionalphosphoramidite chemistry in the 3′-to-5′ direction. Furthermore, allbead-immobilized gene specific primers are mixed together and split forseveral (e.g., hundreds-thousands) compartments wherein each compartmentcomprises the same pool of gene specific primers. Each gene specificprimer set immobilized on beads in each compartment is used in the nextstep of barcode synthesis wherein unique sample specific barcode will besynthesized in each compartment. As a result of this pool-split chemicalsynthesis strategy, the hundreds-thousands of sets of gene specificprimers will be encoded with specific barcodes. The barcoded genespecific primer sets could be released from the beads and used in thedisclosed primer extension assay.

A schematic illustration of an embodiment a protocol employed samplebarcoded reverse gene specific primers is illustrated in FIG. 3. In theembodiment illustrated in FIG. 3, reverse gene specific primers areemployed that include a 3′ reverse gene specific primer domain (denotedRevGSP), a sample barcode domain (denoted as Cell Barcode) positioned 5′of the gene specific primer domain, and 5′ anchor domain (denoted Anchor2) that is 5′ of the sample barcode domain. In the first step of theprotocol illustrated in FIG. 3, the reverse gene specific primers arecombined with the sample ribonucleic acids, e.g., an mRNA sampleobtained from a single cell, under hybridization conditions such thatthe reverse gene specific primers bind to their complementary domains ofthe mRNAs. Where desired, the resultant mRNA template/GSP complexes ofthe resultant hybrid composition may be purified from any excess ofunbound gene-specific primers, e.g., using any convenient protocol. Forexample, any excess of unbound oligonucleotides may be removed usingpurification protocols, such as but not limited to: nuclease treatment(e.g., exonuclease I treatment), chromatography, size-dependent bindingto specific matrix, e.g. AMPure beads, etc.

In the embodiment illustrated in FIG. 3, following hybridization of thesample barcode domain anchor domain comprising reverse gene specificprimers to mRNAs of the sample, the methods include reverse transcribingfirst strand cDNA molecules from the sample-barcoded reverse primers.Reverse transcription may be accomplished by contacting the templateRNA/sample-barcoded reverse GSP complexes with a reverse transcriptaseunder reverse transcription conditions, e.g., as described in greaterdetail below. Reverse transcription results in the production of apopulation GSP primer extension products having a sample barcode domainand an anchor domain at their 5′ ends.

As illustrated in FIG. 3, following production of the population GSPprimer extension products, e.g., first strand cDNA molecules, as well asany desired purification or enrichment, e.g., to remove unhybridizedGSPs, the resultant first strand cDNA molecules may be contacted with apopulation of forward GSPs, where the forward GSPs have an anchor domainat their 5′ ends and a gene-specific domain at their 3′ ends. Contactoccurs under polymerase mediated primer extension reaction conditions,e.g., as described in further detail below, to produce forward GSPprimed primer extension products, e.g., forward GSP primed second strandcDNA molecules. Following any desired purification step, e.g., to removeunbound forward GSPs, the resultant primer extension products may thenbe amplified, e.g., using universal primers that bind to the anchordomains, which results in production of a plurality of sample-barcodedanchor-domain-flanked double-stranded gene specific DNA fragments, e.g.,as illustrated in FIG. 3. The resultant population of sample-barcodedanchor-domain-flanked double-stranded gene-specific DNA fragments may befurther processed, e.g., further amplified, e.g., to add sequencingadaptors, etc., such as described in greater detail below.

In the embodiment illustrated in FIG. 4, reverse gene specific primersas employed in the protocol described for FIG. 3 are employed inconjunction with oligo-dT beads, such as described above. In the firststep of the protocol illustrated in FIG. 4, the reverse gene specificprimers and oligo-dT beads are combined with the sample ribonucleicacids, e.g., an mRNA sample obtained from a single cell, underhybridization conditions such that the reverse gene specific primersbind to their complementary domains of the mRNAs and the oligo-dT labelsof the beads bind to the polyA tails of the mRNAs. Where desired, theresultant mRNA template/GSP/bead complexes of the resultant hybridcomposition may be purified from any excess of unbound gene-specificprimers, e.g., using any convenient protocol. In the illustratedembodiment shown in FIG. 4, the resultant complexes are immobilized onbeads and can be purified from any excess of non-hybridizedgene-specific primers using a convenient washing protocol, such aswashing protocols known in the art.

In the embodiment illustrated in FIG. 4, following hybridization of thesample barcode domain anchor domain comprises reverse gene specificprimers to mRNAs of the sample, the methods include reverse transcribingfirst strand cDNA molecules from the sample-barcoded reverse primers.Reverse transcription may be accomplished by contacting the templateRNA/sample-barcoded reverse GSP complexes with a reverse transcriptaseunder reverse transcription conditions, e.g., as described in greaterdetail below. Reverse transcription results in the production of apopulation GSP primer extension products having a sample barcode domainand an anchor domain at their 5′ ends.

As illustrated in FIG. 4, following production of the population GSPprimer extension products, e.g., first strand cDNA molecules, as well asany desired purification or enrichment, e.g., to remove unhybridizedGSPs, the resultant first strand cDNA molecules may be contacted with apopulation of forward GSPs, where the forward GSPs have an anchor domainat their 5′ ends and a gene-specific domain at their 3′ ends. Contactoccurs under polymerase mediated primer extension reaction conditions,e.g., as described in further detail below, to produce forward GSPprimed primer extension products, e.g., forward GSP primed second strandcDNA molecules. The resultant primer extension products may then beamplified, e.g., using universal primers that bind to the anchordomains, which results in production of a plurality of sample-barcodedanchor-domain-flanked double-stranded gene specific DNA fragments, e.g.,as illustrated in FIG. 4. The resultant population of sample-barcodedanchor-domain-flanked double-stranded gene-specific DNA fragments may befurther processed, e.g., further amplified, e.g., to add sequencingadaptors, etc., such as described in greater detail below.

In some instances, the sample barcoded reverse primers are linked to asolid support. The solid support may vary, where examples of solidsupports include beads, wells, plates, etc., e.g., made of a suitablesolid phase material, such as a polymeric material, where the surface isconfigured to provide the desired bond to the reverse primers. In someinstances, the reverse gene specific primer is linked to the solidsupport by a cleavable linker, i.e., a linker that may be broken inresponse to an applied stimulus. In such instances, any convenientcleavable linker may be employed. Examples of cleavable linkers that maybe employed include, but are not limited to, thermal-labile linkers,enzymatically-labile linkers, light-labile linkers, etc.

In some instances, the linker is a thermal labile linker that includes athermally-labile blocking moiety. A thermally-labile blocking moiety isa moiety that may be cleaved when the temperature of the primer israised above a certain threshold value. While the threshold value mayvary, in some instances the threshold value is 60° C. or higher, such as75° C. or higher, including 90° C. or higher. Examples of thermallylabile moieties that may be employed in accordance with the inventioninclude, but are not limited to, those described in U.S. Pat. Nos.8,133,669 and 8,361,753; the disclosures of which are hereinincorporated by reference. In some instances, the thermally labileblocking moiety is a 3′ blocking moiety, such as but not limited to:O-phenoxyacetyl; O-methoxyacetyl; O-acetyl; O-(p-toluene)sulfonate;O-phosphate; O-nitrate; O-[4-methoxy]-tetrahydrothiopyranyl;O-tetrahydrothiopyranyl; O-[5-methyl]-tetrahydrofuranyl; O-[2-methyl,4-methoxy]-tetrahydropyranyl; O-[5-methyl]-tetrahydropyranyl; andO-tetrahydrothiofuranyl.

In some instances, the linker is an enzymatically-labile linker. Anenzymatically-labile linker includes a moiety that may be cleaved byexposing the linker to a suitable enzyme that cleaves the moiety.Examples of enzymatically-labile moieties of interest include thosehaving a linkage group cleavable by a hydrolase enzyme. Examples ofhydrolase enzymes of interest include, but are not limited to:esterases, phosphatases, peptidases, penicillin amidases, glycosidasesand phosphorylases, kinases, etc. Hydrolase susceptible linkages andhydrolase enzymes are further described in U.S. Patent ApplicationPublication No. 20050164182 and U.S. Pat. No. 7,078,499; the disclosuresof which are herein incorporated by reference.

In some instances, the linker is a chemically-labile linker thatincludes a chemically-labile moiety. A chemically-labile is a moietythat may be cleaved by exposing the linker to a chemical agent thatcleaves the moiety. The chemically-labile moiety may be reactive withthe functional group of a chemical agent (e.g., an azido-containingmodifiable group that is reactive with an alkynyl-containing reagent ora phosphine reagent, or vice versa, or a disulfide that is reactive witha reducing agent such as tris(2-carboxyethyl)phosphine (TCEP) or DTT). Avariety of functional group chemistries and chemical agent stimulisuitable for modifying them may be utilized in the subject methods.Functional group chemistries and chemical agents of interest include,but are not limited to, click chemistry groups and reagents (e.g., asdescribed by Sharpless et al., (2001), “Click Chemistry: DiverseChemical Function from a Few Good Reactions”, Angewandte ChemieInternational Edition 40 (11): 2004-2021), Staudinger ligation groupsand reagents (e.g., as described by Bertozzi et al., (2000), “CellSurface Engineering by a Modified Staudinger Reaction”, Science 287(5460): 2007), and other bioconjugation groups and reagents (e.g., asdescribed by Hermanson, Bioconjugate Techniques, Second Edition,Academic Press, 2008). In certain embodiments, the chemically-labileblocking moiety includes a functional group selected from an azido, aphosphine (e.g., a triaryl phosphine or a trialkyl phosphine or mixturesthereof), a dithiol, an active ester, an alkynyl, a protected amino, aprotected hydroxy, a protected thiol, a hydrazine, and a disulfide.

In some instances, the cleavable linker is a light-labile linker thatincludes a light-labile moiety, which is a moiety that may be cleaved byexposing the linker to light at a wavelength that cleaves the moietyfrom the linker. Examples of light-labile moieties of interest includecleavable by light of a certain wavelength that cleaves a photocleavablegroup in the linkage group. Any convenient photocleavable groups mayfind use. Cleavable groups and linkers may include photocleavable groupscomprising covalent bonds that break upon exposure to light of a certainwavelength. Suitable photocleavable groups and linkers for use in thesubject MCIPs include ortho-nitrobenzyl-based linkers, phenacyl linkers,alkoxybenzoin linkers, chromium arene complex linkers, NpSSMpact linkersand pivaloylglycol linkers, as described in Guillier et al. (Chem. Rev.2000 1000:2091-2157). For example, a 1-(2-nitrophenyl)ethyl-basedphotocleavable linker (Ambergen) can be efficiently cleaved usingnear-UV light, e.g., achieving >90% yield in 5-10 minutes using a 365 nmpeak lamp at 1-5 mW/cm2. In some embodiments, the modifiable group is aphotocleavable group such as a nitro-aryl group, e.g., a nitro-indolegroup or a nitro-benzyl group, including but not limited to:2-nitroveratryloxycarbonyl, α-carboxy-2-nitrobenzyl,1-(2-nitrophenyl)ethyl, 1-(4,5-dimethoxy-2-nitrophenyl)ethyl and5-carboxymethoxy-2-nitrobenzyl. Nitro-indole groups of interest include,e.g., a 3-nitro-indole, a 4-nitro indole, a 5-nitro indole, a6-nitro-indole or a 7-nitro-indole group, where the indole ring may befurther substituted at any suitable position, e.g., with a methyl groupor a halo group (e.g., a bromo or chloro), e.g., at the 3-, 5- or7-position. In certain embodiments, the nitro-aryl group is a 7-nitroindolyl group. In certain instances, the 7-nitro indolyl group isfurther substituted with a substituent that increases the photoactivityof the group, e.g., substituted with a bromo at the 5-position. Anyconvenient photochemistry of nitroaryl groups may be adapted for use. Incertain embodiments, the linker includes a photocleavable group, such asa nitro-benzyl protecting group or a nitro-indolyl group.

An example of a protocol that employs sample barcoded reverse genespecific primers linked to a solid support by a cleavable linker isillustrated in FIG. 5. In the protocol illustrated in FIG. 5, templateribonucleic acids from two cells are employed, where the protocolemploys a pooling step and sample barcodes to match the results to thecellular source. As illustrated in FIG. 5, reverse gene specific primersare employed that include a 3′ reverse gene specific primer domain(denoted RevGSP), a non-cleavable linker domain (denoted Linker) that is5′ of the RevGSP, a sample barcode domain (denoted as Barcode 1 orBarcode 2) positioned 5′ of the non-cleavable linker, a 5′ anchor domain(denoted Anchor 1) that is 5′ of the sample barcode domain, and a beadlinked to the 5′ end of the Anchor 1 domain by a cleavable linker (shownas an X).

In the first step of the protocol illustrated in FIG. 5, the samplebarcoded reverse gene specific primers that include the Barcode 1 samplebarcode are combined with a first cell and encapsulated in a firstdroplet, e.g., as described below. The sample barcoded reverse genespecific primers that include the Barcode 2 sample barcode are combinedwith a second cell and encapsulated in a second droplet. The cells inthe first and second droplets are then lysed and the beads are removedfrom the reverse primers by cleaving the cleavable linker. In each ofthe first and second droplets, the resultant cleaved reverse primers aremaintained with the liberated mRNAs obtained from the lysed cells underhybridization conditions such that the reverse gene specific primersbind to their complementary domains of the mRNAs.

As illustrated in FIG. 5, the first and second droplets containing theresultant hybrid composition of the mRNA template/GSP complexes are thencombined or pooled into a single composition. Following this poolingstep, excess unbound gene-specific primers are removed from the combinedcomposition so that the combined hybrid compositions can be purifiedfrom any excess of unbound gene-specific primers, e.g., using anyconvenient protocol. For example, any excess of unbound oligonucleotidesmay be removed using purification protocols, such as but not limited to:nuclease treatment (e.g., exonuclease I treatment), chromatography,size-dependent binding to specific matrix, e.g. AMPure beads, etc.

In the embodiment illustrated in FIG. 5, following purification, themethods include reverse transcribing first strand cDNA molecules fromthe sample-barcoded reverse primers. As reviewed above, reversetranscription may be accomplished by contacting the templateRNA/sample-barcoded reverse GSP complexes with a reverse transcriptaseunder reverse transcription conditions, e.g., as described in greaterdetail below. Reverse transcription results in the production of apopulation GSP primer extension products having a sample barcode domainand an anchor domain at their 5′ ends.

Following production of the population GSP primer extension products,e.g., first strand cDNA molecules, as well as any desired purificationor enrichment, e.g., to remove unhybridized GSPs, the resultant firststrand cDNA molecules may be contacted with a population of forwardGSPs, where the forward GSPs have an anchor domain at their 5′ ends anda gene-specific domain at their 3′ ends. Contact occurs under polymerasemediated primer extension reaction conditions, e.g., as described infurther detail below, to produce forward GSP primed primer extensionproducts, e.g., forward GSP primed second strand cDNA molecules.Following any desired purification, e.g., to remove unbound forwardGSPs, the resultant primer extension products may then be amplified,e.g., using universal primers that bind to the anchor domains, whichresults in production of a plurality of sample-barcodedanchor-domain-flanked double-stranded gene specific DNA fragments. Theresultant population of sample-barcoded anchor-domain-flankeddouble-stranded gene-specific DNA fragments may be further processed,e.g., further amplified, e.g., to add sequencing adaptors, etc., such asdescribed in greater detail below, e.g., for performing NGS.

In some instances where the sample barcoded reverse primers are bound toa solid support, such as a bead, by a cleavable linker (e.g., asdescribed above), the solid support may further include a specificbinding pair member, e.g., that includes a specific binding domain thatspecifically binds to a marker of a cell of interest. Specific bindingdomains of interest include, but are not limited to, antibody bindingagents, proteins, peptides, haptens, nucleic acids, etc. The term“antibody binding agent” as used herein includes polyclonal ormonoclonal antibodies or fragments that are sufficient to bind to ananalyte of interest. The antibody fragments can be, for example,monomeric Fab fragments, monomeric Fab′ fragments, or dimeric F(ab)′2fragments. Also within the scope of the term “antibody binding agent”are molecules produced by antibody engineering, such as single-chainantibody molecules (scFv) or humanized or chimeric antibodies producedfrom monoclonal antibodies by replacement of the constant regions of theheavy and light chains to produce chimeric antibodies or replacement ofboth the constant regions and the framework portions of the variableregions to produce humanized antibodies. The marker of the cell ofinterest may be any convenient marker, such as a cell surface protein orstructure having an epitope to which the specific binding domain mayspecifically bind. In such instances, the bead linked sample barcodedreverse primers may include one or more additional domains of interest,such as bead identifying domains (bead barcodes), antibody identifyingdomains (antibody barcodes), etc.

An example of a protocol that employs sample barcoded reverse genespecific primers linked to a solid support by a cleavable linker, wherethe support includes a cell specific binding domain, is illustrated inFIG. 6. In the protocol illustrated in FIG. 6, template ribonucleicacids from two cells are employed, where the protocol employs a poolingstep and sample barcodes are employed to match the results to thecellular source and antibody employed. As illustrated in FIG. 6, reversegene specific primers are employed that include a 3′ reverse genespecific primer domain (denoted RevGSP), a non-cleavable linker domain(denoted Linker) that is 5′ of the RevGSP, a sample barcode domain(denoted as Barcode 1 or Barcode 2) positioned 5′ of the non-cleavablelinker, an antibody barcode domain (denoted ab1 or ab2) positioned 5′ ofthe sample barcode domain, a 5′ anchor domain (denoted Anchor 1) that is5′ of the sample barcode domain, a bead linked to the 5′ end of theAnchor 1 domain by a cleavable linker (shown as an X) and an antibody onthe bead that specifically binds to a cell surface antigen.

In the first step of the protocol illustrated in FIG. 6, the samplebarcoded reverse gene specific primers that include the Barcode 1 samplebarcode and ab1 barcode (collectively denoted 1ab1) are combined with afirst cell under conditions sufficient for the antibody on the bead tospecifically bind to the antigen on the cell, and the resultant bindingcomplex is encapsulated in a first droplet. The sample barcoded reversegene specific primers that include the Barcode 2 sample barcode and ab2barcode (collectively denoted 2ab2) are combined with a second cellunder conditions sufficient for the antibody on the bead to specificallybind to the antigen on the cell, and the resultant binding complex isencapsulated in a second droplet. The cells in the first and seconddroplets are then lysed and the beads are removed from the reverseprimers by cleaving the cleavable linker. In each of the first andsecond droplets, the resultant cleaved reverse primers are maintainedwith the liberated mRNAs obtained from the lysed cells underhybridization conditions such that the reverse gene specific primersbind to their complementary domains of the mRNAs.

As illustrated in FIG. 6, the first and second droplets containing theresultant hybrid composition of the mRNA template/GSP complexes are thencombined or pooled into a single composition. Following this poolingstep, excess unbound gene-specific primers are removed from the combinedcomposition so that the combined hybrid compositions can be purifiedfrom any excess of unbound gene-specific primers, e.g., using anyconvenient protocol. For example, any excess of unbound oligonucleotidesmay be removed using purification protocols, such as but not limited to:nuclease treatment (e.g., exonuclease I treatment), chromatography,size-dependent binding to specific matrix, e.g. AMPure beads, etc.

In the embodiment illustrated in FIG. 6, following purification, themethods include reverse transcribing first strand cDNA molecules fromthe sample-barcoded reverse primers. As reviewed above, reversetranscription may be accomplished by contacting the templateRNA/sample-barcoded reverse GSP complexes with a reverse transcriptaseunder reverse transcription conditions, e.g., as described in greaterdetail below. Reverse transcription results in the production of apopulation GSP primer extension products having a sample barcode domainand an anchor domain at their 5′ ends.

Following production of the population GSP primer extension products,e.g., first strand cDNA molecules, as well as any desired purificationor enrichment, e.g., to remove unhybridized GSPs, the resultant firststrand cDNA molecules may be contacted with a population of forwardGSPs, where the forward GSPs have an anchor domain at their 5′ ends anda gene-specific domain at their 3′ ends. Contact occurs under polymerasemediated primer extension reaction conditions, e.g., as described infurther detail below, to produce forward GSP primed primer extensionproducts, e.g., forward GSP primed second strand cDNA molecules.Following any desired purification, e.g., to remove unbound forwardGSPs, the resultant primer extension products may then be amplified,e.g., using universal primers that bind to the anchor domains, whichresults in production of a plurality of sample-barcodedanchor-domain-flanked double-stranded gene specific DNA fragments. Theresultant population of sample-barcoded anchor-domain-flankeddouble-stranded gene-specific DNA fragments may be further processed,e.g., further amplified, e.g., to add sequencing adaptors, etc., such asdescribed in greater detail below, e.g., for performing NGS.

Template Nucleic Acids

Components of the subject reaction mixtures may include one or moretemplate nucleic acids. Such template nucleic acids provide the templatefrom which template nucleic acid-mediated primer extension reactions andother nucleic acid production reactions may be performed. Nucleic acidtemplates may be added to a reaction mixture, e.g., through directaddition of the nucleic acid template, through lysing one or more cellscontaining the nucleic acid template, and the like, or one or morenucleic acid templates may be generated during the reaction, e.g., as anintermediate product of a prior nucleic acid production reaction.Essentially any nucleic acid template may find use in the subjectmethods, including e.g., RNA template nucleic acid and DNA templatenucleic acids. RNA template nucleic acids may vary and may include e.g.,messenger RNA (mRNA) templates, and the like. In addition, various typesof DNA templates may be employed, including but not limited to e.g.,genomic DNA templates, mtDNA templates, synthetic DNA templates, etc.

According to certain embodiments, the template nucleic acids aretemplate ribonucleic acids (template RNA). Template RNAs may be any typeof RNA (or sub-type thereof) including, but not limited to, a messengerRNA (mRNA), a microRNA (miRNA), a small interfering RNA (siRNA), atransacting small interfering RNA (ta-siRNA), a natural smallinterfering RNA (nat-siRNA), a ribosomal RNA (rRNA), a transfer RNA(tRNA), a small nucleolar RNA (snoRNA), a small nuclear RNA (snRNA), along non-coding RNA (IncRNA), a non-coding RNA (ncRNA), atransfer-messenger RNA (tmRNA), a precursor messenger RNA (pre-mRNA), asmall Cajal body-specific RNA (scaRNA), a piwi-interacting RNA (piRNA),an endoribonuclease-prepared siRNA (esiRNA), a small temporal RNA(stRNA), a signal recognition RNA, a telomere RNA, a ribozyme, or anycombination of RNA types thereof or subtypes thereof.

According to certain embodiments, the template nucleic acids aretemplate deoxyribonucleic acids (template DNA). A template DNA may beany type of DNA of interest to a practitioner of the subject methods,including but not limited to genomic DNA or fragments thereof,complementary DNA (or “cDNA”, synthesized from any RNA or DNA ofinterest), recombinant DNA (e.g., plasmid DNA), or the like.

The number of distinct template nucleic acids of differing sequence in agiven template nucleic acid composition may vary. While the number ofdistinct template nucleic acids in a given template nucleic acidcomposition may vary, in some instances the number of distinct templatenucleic acids in a given template nucleic acid composition ranges from 1to 10⁸, such as 1 to 10⁷, including 1 to 10⁵.

The template nucleic acid composition employed in such methods may beany suitable nucleic acid sample. The nucleic acid sample that includesthe template nucleic acid may be combined into the reaction mixture inan amount sufficient for producing the product nucleic acid. Accordingto one embodiment, the nucleic acid sample is combined into the reactionmixture such that the final concentration of nucleic acid in thereaction mixture is from 1 fg/μL to 10 μg/μL, such as from 1 μg/μL to 5μg/μL, such as from 0.001 μg/μL to 2.5 μg/μL, such as from 0.005 μg/μLto 1 μg/μL, such as from 0.01 μg/μL to 0.5 μg/μL, including from 0.1μg/μL to 0.25 μg/μL.

Template nucleic acid components are nucleic acid samples that containone or more types of template nucleic acids, as described in more detailbelow. Template nucleic acid components may be derived from cellularsamples including cellular samples that contain a single cell or apopulation of cells containing, e.g., two or more cells. Cellularsamples may be derived from a variety of sources including but notlimited to e.g., a cellular tissue, a biopsy, a blood sample, a cellculture, etc. Additionally, cellular samples may be derived fromspecific organs, tissues, tumors, neoplasms, or the like. Furthermore,cells from any population can be the source of a cellular sample used inthe subject methods, such as a population of prokaryotic or eukaryoticsingle celled organisms including bacteria or yeast.

As such, in some instances, the source of an RNA sample utilized in thesubject methods may be a mammalian cellular sample, such as a rodent(e.g., mouse or rat) cellular sample, a non-human primate cellularsample, a human cellular sample, or the like. In some instances, amammalian cellular sample may be mammalian blood sample, including butnot limited to e.g., a rodent (e.g., mouse or rat) blood sample, anon-human primate blood sample, a human blood sample, or the like.

In some instances, the template nucleic acid component is from a singlecell. A template nucleic acid component from a single cell is a nucleicacid composition, e.g., a composition of one or more distinct nucleicacids, such as ribonucleic acids or deoxyribonucleic acids thatoriginate or are derived from a single cell. As used herein, a “singlecell” refers to one cell. Single cells useful as the source of templatenucleic acids, e.g., RNAs or DNAs, can be obtained from an organism ortissue of interest, or from a biopsy, blood sample, or cell culture,etc. Additionally, cells from specific organs, tissues, tumors,neoplasms, or the like can be obtained and used in the methods describedherein. In some instances, the template nucleic acid component isobtained from a portion of a single cell. Single cell portions ofinterest include, but are not limited to: organelles, exosomes or morebroadly nucleic acids contained within, or associated with, a proteinand or lipid bearing membrane.

Template nucleic acids of template nucleic acid components employed inembodiments of the invention may contain a plurality of distincttemplate nucleic acids of differing sequence. Template nucleic acids(e.g., a template RNA, a template DNA, or the like) may be polymers ofany length. While the length of the polymers may vary, in some instancesthe polymers are 10 nt or longer, 20 nt or longer, 50 nt or longer, 100nt or longer, 500 nt or longer, 1000 nt or longer, 2000 nt or longer,3000 nt or longer, 4000 nt or longer, 5000 nt or longer or more nt. Incertain aspects, template nucleic acids are polymers, where the numberof bases on a polymer may vary, and in some instances is 10 nt or less,20 nt or less, 50 nt or less, 100 nt or less, 500 nt or less, 1000 nt orless, 2000 nt or less, 3000 nt or less, 4000 nt or less, or 5000 nt orless, 10,000 nt or less, 25,000 nt or less, 50,000 nt or less, 75,000 ntor less, 100,000 nt or less.

Single cells, for use in the herein described methods relating thereto,may be obtained by any convenient method. For example, in someinstances, single cells may be obtained through limiting dilution ofcellular sample. In some instances, the present methods may include astep of obtaining single cells. A single cell suspension can be obtainedusing standard methods known in the art including, for example,enzymatically using trypsin or papain to digest proteins connectingcells in tissue samples or releasing adherent cells in culture, ormechanically separating cells in a sample. Single cells can be placed inany suitable reaction vessel in which single cells can be treatedindividually. For example, a 96-well plate, 384 well plate, or a platewith any number of wells such as 2000, 4000, 6000, or 10000 or more. Themulti-well plate can be part of a chip and/or device. The presentdisclosure is not limited by the number of wells in the multi-wellplate. In various embodiments, the total number of wells on the plate isfrom 100 to 200,000, or from 5000 to 10,000. In other embodiments theplate comprises smaller chips, each of which includes 5,000 to 20,000wells. For example, a square chip may include 125 by 125 nanowells, witha diameter of 0.1 mm. Such methods are further described in greaterdetail below.

In some instances, single cells may be obtained by sorting a cellularsample using a cell sorter instrument. By “cell sorter” as used hereinis meant any instrument that allows for the sorting of individual cellsinto an appropriate vessel for downstream processes, such as thoseprocesses of library preparation as described herein. Useful cellsorters include flow cytometers, such as those instruments utilized influorescence activated cell sorting (FACS). Flow cytometry is awell-known methodology using multi-parameter data for identifying anddistinguishing between different particle (e.g., cell) types i.e.,particles that vary from one another terms of label (wavelength,intensity), size, etc., in a fluid medium. In flow cytometricallyanalyzing a sample, an aliquot of the sample is first introduced intothe flow path of the flow cytometer. When in the flow path, the cells inthe sample are passed substantially one at a time through one or moresensing regions, where each of the cells is exposed separatelyindividually to a source of light at a single wavelength (or in someinstances two or more distinct sources of light) and measurements ofscatter and/or fluorescent parameters, as desired, are separatelyrecorded for each cell. The data recorded for each cell is analyzed inreal time or stored in a data storage and analysis means, such as acomputer, for later analysis, as desired.

Cells sorted using a flow cytometer may be sorted into a common vessel(i.e., a single tube), or may be separately sorted into individualvessels. For example, in some instances, cells may be sorted intoindividual wells of a multi-well plate, as described below.

Useful cell sorters also include multi-well-based systems that do notemploy flow cytometry. Such multi-well based systems include essentiallyany system where cells may be deposited into individual wells of amulti-well container by any convenient means, including e.g., throughthe use of Poisson distribution (i.e., limiting dilution) statistics,individual placement of cells (e.g., through manual cell picking ordispensing using a robotic arm or pipettor). In some instances, usefulmulti-well systems include a multi-well wafer or chip, where cells aredeposited into the wells or the wafer/chip and individually identifiedby a microscopic analysis system. In some instances, an automatedmicroscopic analysis system may be employed in conjunction with amulti-well wafer/chip to automatically identify individual cells to besubjected to downstream analyses, including library preparation, asdescribed herein.

In some instances, one or more cells may be sorted into or otherwisetransferred to an appropriate reaction vessel. Reaction components maybe added to reaction vessels, including e.g., components for preparing atemplate nucleic acid component, components for generating a productdouble stranded cDNA, components for one or more library preparationreactions, etc.

The wells of a multi-well device can be designed such that a single wellincludes a single cell or a single droplet. An individual cell ordroplet may also be isolated in any other suitable container, e.g.,microfluidic chamber, droplet, nanowell, tube, etc. Any convenientmethod for manipulating single cells or droplets may be employed, wheresuch methods include fluorescence activated cell sorting (FACS), roboticdevice injection, gravity flow, or micromanipulation and the use ofsemi-automated cell pickers (e.g. the Quixell™ cell transfer system fromStoelting Co.), etc. In some instances, single cells or droplets can bedeposited in wells of a plate according to Poisson statistics (e.g.,such that approximately 10%, 20%, 30% or 40% or more of the wellscontain a single cell or droplet—which number can be defined byadjusting the number of cells or droplets in a given unit volume offluid that is to be dispensed into the containers). In some instances, asuitable reaction vessel comprises a droplet (e.g., a microdroplet).Individual cells or droplets can, for example, be individually selectedbased on features detectable by microscopic observation, such aslocation, morphology, the presence of a reporter gene (e.g.,expression), the presence of a bound antibody (e.g., antibodylabelling), FISH, the presence of an RNA (e.g., intracellular RNAlabelling), or qPCR.

Following obtainment of a desired cell population or single cells, e.g.,as described above, nucleic acids can be released from the cells bylysing the cells. Lysis can be achieved by, for example, heating orfreeze-thaw of the cells, or by the use of detergents or other chemicalmethods, or by a combination of these. However, any suitable lysismethod can be used. In some instances, a mild lysis procedure canadvantageously be used to prevent the release of nuclear chromatin,thereby avoiding genomic contamination of a cDNA library, and tominimize degradation of mRNA. For example, heating the cells at 72° C.for 2 minutes in the presence of Tween-20 is sufficient to lyse thecells while resulting in no detectable genomic contamination fromnuclear chromatin. Alternatively, cells can be heated to 65° C. for 10minutes in water (Esumi et al., Neurosci Res 60(4):439-51 (2008)); or70° C. for 90 seconds in PCR buffer II (Applied Biosystems) supplementedwith 0.5% NP-40 (Kurimoto et al., Nucleic Acids Res 34(5):e42 (2006));or lysis can be achieved with a protease such as Proteinase K or by theuse of chaotropic salts such as guanidine isothiocyanate (U.S.Publication No. 2007/0281313).

Calibration Control Template Composition

In some instances, preparation of the target nucleic acid templatecomposition includes combining an initial nucleic acid composition,e.g., as described above, with a calibration control templatecomposition, e.g., to produce a target nucleic acid template compositionthat is spiked with a control template mixture, which mixture may bemade up of synthetic nucleic acids, naturally occurring nucleic acids ora combination thereof. As to structural requirements, the calibrationcontrol template nucleic acids at least include an amplicon structurewith two primer binding sites which mimic natural target templatenucleic acid. In some embodiments, the control template mix is asynthetic control template mix that includes calibration controltemplate nucleic acids having sequences that mimic, but are differentfrom, the sequences of target template nucleic acids. For example, acalibration control template nucleic acid could have one, two, or morepoint mutations downstream of primer binding sites. These mutations maybe identified by downstream NGS analysis allowing one to uniquelyidentify and differentiate sample specific and control template nucleicacids from each other. Other types of control template modifications(e.g., deletions, insertions, etc.) could be employed, as desired.Furthermore, the calibration control templates may mimic natural targetnucleic acid template structures. For example, synthetic gene or genefragments with several point mutations could be synthesized under thecontrol of a T7 promoter, and T7 transcripts which mimic the naturaltemplate target mRNA sequences and structures could be synthesized invitro and spiked to into a cell extract or purified RNA at any knownconcentration. Moreover, the set of calibration standards designedagainst the same target mRNA with different mutations could be spikedinto target template compositions at different amounts (e.g. at 1, 10,100, 1000 copies per cell). The spiked calibration controls could beemployed as internal calibration standards in primer extension assayswhich allow one to calculate the actual concentration of natural targetmRNA template. Moreover, the spiked calibration standards may beemployed as universal standards to do quality control of target mRNAs inbiological samples. For example, if cells are apoptotic, non-functionalor damaged, the calibration standards spiked into a single-cell analysiswould allow one to reveal these defect cells with degraded or missingtemplate RNAs. The calibration control nucleic acid templates could bespiked directly into a cell, cell extract, cell fractions, purifiedcells or at any step of the primer extension and multiplex PCR protocol.In one embodiment, the control nucleic acid templates are spiked intocells, cell extracts or purified RNA. For example, in single-cellanalysis, the control RNA templates could be mixed withlysis/hybridization buffer prior to droplet formation in 10×Genomics,Mission Bio or BioRad single-cell analysis platforms. In otherembodiments, the control template compositions could be spiked into celllysates by pipet (e.g., ink-jet printer) or immobilized on beadstogether with barcoded gene specific primers. Calibration controltemplates could be designed and developed for single gene or gene setsincluding genome-wide set. For example, a mix of calibration controlRNAs could be developed for a set of housekeeping genes. Thehousekeeping calibration control RNAs would allow one to compare thecontent and quality of target template RNAs in the single-cells ormultiplex analysis of plurality of clinical samples. The set of cellspecific marker calibration control RNAs may allow one to perform QC andidentify specific cell types. Genome-wide set of calibration controlRNAs could be employed to perform quantitative analysis of expression ofall genes in any cell or biological sample. In another embodiment, thecalibration control RNAs designed against pathogens (e.g. viruses,bacterial species, etc.) may be employed as a unique tool to performquantitative expression analysis of pathogenic genes in the backgroundof human transcripts in clinical samples. Calibration control templatesas internal calibration standards are unique tool for analysis pluralityof biological samples in parallel. Combination of barcoded gene specificprimers and calibration control nucleic acid templates allows one tocombine samples together at the early stage of the protocol and performpowerful multiplex analysis of hundreds of samples or thousands ofsingle cells in parallel in single test tube.

Primer Extension Reaction Conditions

As reviewed above, aspects of the methods include contacting primers,e.g., oligo dT primers and/or GSPs, such as described above, with anucleic acid template composition, which may be made up of an initialnucleic acid sample or be primer extension products, under primerextension reaction conditions. By “primer extension reaction conditions”is meant reaction conditions that permit polymerase-mediated extensionof a 3′ end of a nucleic acid strand, i.e., primer, hybridized to atemplate nucleic acid. Achieving suitable reaction conditions mayinclude selecting reaction mixture components, concentrations thereof,and a reaction temperature to create an environment in which thepolymerase is active and the relevant nucleic acids in the reactioninteract (e.g., hybridize) with one another in the desired manner.

The concentration of primers in the primer extension reaction mixtureproduced upon combination of the template nucleic acid and primers mayvary, as desired. The amount of target template nucleic acid that iscombined with the primers and other reagents, e.g., as described below,to produce a primer extension reaction mixture may vary. In someinstances, the target nucleic acid template composition is combined intothe reaction mixture such that the final concentration of nucleic acidin the reaction mixture ranges from 1 fg/μL to 10 μg/μL, such as from 1μg/μL to 5 μg/μL, such as from 0.1 ng/μL to 50 ng/μL, such as from 0.5ng/μL to 20 ng/μL, including from 1 ng/μL to 10 ng/μL.

In producing the primer extension reaction mixture, the primers andtarget template nucleic acid composition are combined with a number ofadditional reagents (e.g., to increase specificity, uniformity, yield,etc. of extension products), which may vary as desired. A variety ofpolymerases may be employed when practicing the subject methods.Reference to a particular polymerase, such as those exemplified below,will be understood to include functional variants thereof unlessindicated otherwise. Examples of useful polymerases include DNApolymerases, e.g., where the template nucleic acid is DNA. In someinstances, DNA polymerases of interest include, but are not limited to:thermostable DNA polymerases, such as may be obtained from a variety ofbacterial species, including Thermus aquaticus (Taq), Thermusthermophilus (Tth), Thermus filiformis, Thermus flavus, Thermococcusliteralis, and Pyrococcus furiosus (Pfu) or modified and mutatedversions of these DNA polymerases (e.g. Phusion DNA polymerase, Q5 DNApolymerase, etc.). Alternatively, where the target template nucleic acidcomposition is made up of RNA, the polymerase may be a reversetranscriptase (RT), where examples of reverse transcriptases includeMoloney Murine Leukemia Virus reverse transcriptase (MMLV RT), e.g.,SuprScript II, SuperScript III, MaxiScript reverse transcriptase(Thermo-Fsher), SMARTScribe™ reverse transcriptase (Takara), AMV reversetranscriptase, Bombyx mori reverse transcriptase (e.g., Bombyx mori R2non-LTR element reverse transcriptase), etc. In one embodiment, theenzymes with DNA polymerase activity are designed for hot-start primerextension reaction, e.g., used as a complex with specific antibody orchemical compound which blocks enzymatic activity at low temperature butfully releases the activity at reaction conditions. For example, in someinstances a hot-start reverse transcriptase composition, e.g. complexbetween MMLV RT and Therma-Stop RT reagent (Thermagenix) is employed.

Primer extension reaction mixtures also include dNTPs. In certainaspects, each of the four naturally-occurring dNTPs (dATP, dGTP, dCTPand dTTP) are added to the reaction mixture. For example, dATP, dGTP,dCTP and dTTP may be added to the reaction mixture such that the finalconcentration of each dNTP is from 0.05 to 10 mM, such as from 0.1 to 2mM, including 0.2 to 1 mM. According to one embodiment, at least onetype of nucleotide added to the reaction mixture is a non-naturallyoccurring nucleotide, e.g., a modified nucleotide having a binding orother moiety (e.g., a fluorescent moiety) attached thereto, a nucleotideanalog, or any other type of non-naturally occurring nucleotide thatfinds use in the subject methods or a downstream application ofinterest.

In addition to the template nucleic acid, primers, the polymerase, anddNTPs, the reaction mixture may include buffer components that establishan appropriate pH, salt concentration (e.g., KCl concentration), metalcofactor concentration (e.g., Mg²⁺ or Mn²⁺ concentration), and the like,for the extension reaction and template switching to occur. Othercomponents may be included, such as one or more nuclease inhibitors(e.g., an RNase inhibitor and/or a DNase inhibitor), one or moreadditives for facilitating amplification/replication of GC richsequences (e.g., GC-Melt™ reagent (Clontech Laboratories, Inc. (MountainView, Calif.)), betaine, single-stranded binding proteins (e.g., T4 Gene32, cold shock protein A (CspA), recA protein, and/or the like) DMSO,ethylene glycol, 1,2-propanediol, or combinations thereof), one or moremolecular crowding agents (e.g., polyethylene glycol, or the like), oneor more enzyme-stabilizing components (e.g., DTT present at a finalconcentration ranging from 1 to 10 mM (e.g., 5 mM)), and/or any otherreaction mixture components useful for facilitating polymerase-mediatedextension reactions.

The primer extension reaction mixture can have a pH suitable for theprimer extension reaction. In certain embodiments, the pH of thereaction mixture ranges from 5 to 9, such as from 7 to 9, including from8 to 9, e.g., 8 to 8.5. In some instances, the reaction mixture includesa pH adjusting agent. pH adjusting agents of interest include, but arenot limited to, sodium hydroxide, hydrochloric acid, phosphoric acidbuffer solution, citric acid buffer solution, and the like. For example,the pH of the reaction mixture can be adjusted to the desired range byadding an appropriate amount of the pH adjusting agent.

The temperature range suitable for production of the product nucleicacid may vary according to factors such as the particular polymeraseemployed, the melting temperatures of any optional primers employed,etc. According to one embodiment, the primer extension reactionconditions include bringing the reaction mixture to a temperatureranging from 4 to 72° C., such as from 16 to 70° C., e.g., 37 to 65° C.,such as 60° C. to 65° C. The temperature of the reaction mixture may bemaintained for a sufficient period of time for polymerase mediated,template directed primer extension to occur. While the period of timemay vary, in some instances the period of time ranges from 5 to 60minutes, such as 15 to 45 minutes, e.g., 30 minutes.

In a given primer extension reaction condition, where desired,hybridization complexes of template and primer may be purified, e.g.,via separation from excess of non-bound primers, e.g., by nucleasetreatment or binding to solid support, e.g., such as beads, e.g., asdescribed above. In this way, excess of primers, such as oligo dTprimers and/or gene-specific primers, may be removed in order to achievea high specificity of primer extension reaction from the target templatesequences.

As reviewed above, the barcode labelling step, i.e., where a barcodedomain is transferred from an initial donor nucleic acid to agene-specific primer, may be performed before, after or at the same timeas a primer extension step. Transfer is mediated by a ligation reaction,where in some instances linker domains and a linker oligonucleotide areemployed to enhance ligation results. Where ligation occurs at the sametime as primer extension, primer extension conditions as described maybe employed, where reagents necessary for ligase activity, e.g. NAD,ATP, etc., are included. Alternatively, ligation may be carried out in astep separate from primer extension.

Where desired, the primer extension reaction conditions may include oneor more temperature cycling steps. For example, in some instances, theprimer extension product composition is produced by a method thatincludes first contacting the target nucleic acid template compositionwith a first primer subset that includes for example the forward primersof the set of primer pairs under primer extension reaction conditions toproduce a forward primer extension product composition; increasing thetemperature to denature the resultant product and template strands andinactivate any additional enzymatic activity (e.g., exonuclease Iactivity added after extension step to degrade PCR primers) present inthe forward primer extension product composition (where the elevatedtemperature may vary, ranging in some instances from 90 to 100° C., suchas 95° C.) and then contacting the resultant denatured forward primerextension product composition with a second primer subset that includesthe reverse primers of the set of primer pairs under primer extensionreaction conditions to produce the desired primer extension productcomposition. Where desired, the primer extension products and templatenucleic acids may be separated from any free forward primers prior tocontact with the set of reverse primers. The extended DNA products afterthe first and second extension steps may be purified from the excess ofthe primers using any convenient protocol, including primer digestionwith exonucleases (exonuclease I) or purification, such as Magneticbeads or spin columns, etc.

Amplification

As reviewed above, in some instances primer extension products areamplified, where amplicons are produced from the primer extensionproducts. The term “amplicon” is employed in its conventional sense torefer to a piece of DNA that is the product of artificial amplificationor replication events, e.g., as produced using various methods includingpolymerase chain reactions (PCR), ligase chain reactions (LCR), etc.Where primer extension products are amplified, the primer extensionproducts, e.g., as described above, may include additional domains thatare employed in subsequent amplification steps to produce a desiredamplicon composition. For example, as illustrated in FIGS. 1 to 4,flanking anchor domains are provided in the primer extension products,where the flanking anchor domains include universal priming sites whichmay be employed in PCR amplification.

As such, embodiments of the methods may include combining a primerextension product composition with universal forward and reverse primersunder amplification conditions sufficient to produce a desired productbarcoded amplicon composition. The forward and reverse universal primersmay be configured to bind to the common forward and reverse anchordomains and thereby nucleic acids present in the primer extensionproduct compositions. The universal forward and reverse primers may varyin length, ranging in some instances from 10 to 75 nt, such as 15 to 60nt.

In some instances, the universal forward and reverse primers include oneor more additional domains, such as but not limited to: an indexingdomain, a clustering domain, a Next Generation Sequencing (NGS) adaptordomain (i.e., high-throughput sequencing (HTS) adaptor domain), etc.Alternatively, these domains may be introduced during one or moresubsequent steps, such as one or more subsequent amplificationreactions, e.g., as described in greater detail below. The amplificationreaction mixture will include, in addition to the primer extensionproduct composition and universal forward and reverse primers, otherreagents, as desired, such polymerase, dNTPs, buffering agents, etc.,e.g., as described above.

Amplification conditions may vary. In some instances, the reactionmixture is subjected to polymerase chain reaction (PCR) conditions. PCRconditions include a plurality of reaction cycles, where each reactioncycle includes: (1) a denaturation step, (2) an annealing step, and (3)a polymerization step. The number of reaction cycles will vary dependingon the application being performed, and may be 1 or more, including 2 ormore, 3 or more, four or more, and in some instances may be 15 or more,such as 20 or more and including 30 or more, where the number ofdifferent cycles will typically range from about 12 to 24. Thedenaturation step includes heating the reaction mixture to an elevatedtemperature and maintaining the mixture at the elevated temperature fora period of time sufficient for any double stranded or hybridizednucleic acid present in the reaction mixture to dissociate. Fordenaturation, the temperature of the reaction mixture may be raised to,and maintained at, a temperature ranging from 85 to 100° C., such asfrom 90 to 98° C. and including 94 to 98° C. for a period of timeranging from 3 to 120 sec, such as 5 to 30 sec. Following denaturation,the reaction mixture will be subjected to conditions sufficient forprimer annealing to template DNA present in the mixture. The temperatureto which the reaction mixture is lowered to achieve these conditions maybe chosen to provide optimal efficiency and specificity, and in someinstances ranges from about 50 to 75° C., such as 60 to 74° C. andincluding 68 to 72° C. Annealing conditions may be maintained for asufficient period of time, e.g., ranging from 10 sec to 30 min, such asfrom 10 sec to 5 min. Following annealing of primer to template DNA orduring annealing of primer to template DNA, the reaction mixture may besubjected to conditions sufficient to provide for polymerization ofnucleotides to the primer ends in manner such that the primer isextended in a 5′ to 3′ direction using the DNA to which it is hybridizedas a template, i.e. conditions sufficient for enzymatic production ofprimer extension product. To achieve polymerization conditions, thetemperature of the reaction mixture may be raised to or maintained at atemperature ranging from 65 to 75, such as from about 68 to 72° C. andmaintained for a period of time ranging from 15 sec to 20 min, such asfrom 20 sec to 5 min. In some embodiments, the annealing stage could beavoided, and protocol could include only denaturation and polymerizationsteps as described above. The above cycles of denaturation, annealingand polymerization may be performed using an automated device, typicallyknown as a thermal cycler. Thermal cyclers that may be employed aredescribed in U.S. Pat. Nos. 5,612,473; 5,602,756; 5,538,871; and5,475,610, the disclosures of which are herein incorporated byreference.

The product amplicon composition of this first amplification reactionwill include amplicons corresponding to the gene specific domains thatare present in the initial target nucleic acid composition and arebounded by primer pairs present in the employed set of gene specificprimers and barcode sequence from one side of the amplicon. In someinstances, the number of distinct amplicons of differing sequence inthis initial amplicon composition ranges from 10 to 19,000, 10 to15,000, 10 to 10,000, and 10 to 8,000, such as 25 to 18,500, 25 to12,000, 25 to 8,000, and 25 to 7,500, including 50 to 15,000, 50 to10,000 and 50 to 5,000, where in some instances the number of distinctamplicons present in this initial amplicon composition is 25 or more,including 50 or more, such as 100 or more, 250 or more, 500 or more,1,000 or more, 1,500 or more, 2,500 or more, 5,000 or more, 7,500 ormore, 8,500 or more, 10,000 or more, 15,000 or more, 18,000 or more. Insome instances, this initial amplicon composition includes sequencesfound in at least a subset of the genes listed in Table 2, e.g., asubset of 10 to 5,000, such as 20 to 5,000, 50 to 5,000, 100 to 5,000,such as 100 to 5,000, including 100 to 4,000, 100 to 3,000, and 100 to2,000 of the genes listed in Table 2, or in some instances the ampliconcomposition includes sequences found in all of the genes listed in Table2. A subject amplicon composition may include or exclude multipledifferent product amplicons corresponding to same gene as amplified bytwo or more different primer pairs directed to the gene. The multipleproduct amplicons making up the amplicon composition may vary in length,ranging in length in some instances from 50 to 1000, such as 60 to 500,including 70 to 250 nt.

The sample barcoded initial product amplicon composition may be employedin a variety of different applications, including evaluation of theexpression profile of the sample from which the template target nucleicacid was obtained. In such instances, the expression profile may beobtained from the amplicon composition using any convenient protocol,such as but not limited to differential gene expression analysis,array-based gene expression analysis, NGS sequencing, etc.

For example, the barcoded amplicon composition may be employed inhybridization assays in which a nucleic acid array that displays “probe”nucleic acids for each of the genes to be assayed/profiled in theprofile to be generated is employed. In these assays, the ampliconcomposition is first prepared from the initial target nucleic acidsample being assayed as described above, where preparation may includelabeling of the target nucleic acids with a label, e.g., a member ofsignal producing system. Following amplicon production, e.g., asdescribed above, the sample is contacted with the array underhybridization conditions, whereby complexes are formed between targetnucleic acids that are complementary to probe sequences attached to thearray surface. The presence of hybridized complexes is then detected,either qualitatively or quantitatively. The detection and quantificationof different barcodes could be achieved in the follow-up hybridizationsteps with labeled targets complementary to barcode domains of theamplicons. Specific hybridization technology which may be practiced togenerate the expression profiles employed in the subject methodsincludes the technology described in U.S. Pat. Nos. 5,143,854;5,288,644; 5,324,633; 5,432,049; 5,470,710; 5,492,806; 5,503,980;5,510,270; 5,525,464; 5,547,839; 5,580,732; 5,661,028; 5,800,992; thedisclosures of which are herein incorporated by reference; as well as WO95/21265; WO 96/31622; WO 97/10365; WO 97/27317; EP 373 203; and EP 785280. In these methods, an array of “probe” nucleic acids that includes aprobe for each of the phenotype determinative genes whose expression isbeing assayed is contacted with target nucleic acids as described above.Contact is carried out under hybridization conditions, e.g., stringenthybridization conditions, and unbound nucleic acid is then removed. Theresultant pattern of hybridized nucleic acid provides informationregarding expression for each of the genes that have been probed, wherethe expression information is in terms of whether or not the gene isexpressed and, typically, at what level, where the expression data,i.e., expression profile (e.g., in the form of a transcriptome), may beboth qualitative and quantitative.

Alternatively, non-array-based methods for quantifying the levels of oneor more nucleic acids in a sample may be employed, includingquantitative PCR, real-time quantitative PCR, and the like. (For generaldetails concerning real-time PCR see Real-Time PCR: An Essential Guide,K. Edwards et al., eds., Horizon Bioscience, Norwich, U.K. (2004)).

In some embodiments, the method further includes sequencing the multiplebarcoded product amplicons, e.g., by using a Next Generation Sequencing(NGS) protocol. In such instances, if not already present, the methodsmay include modifying the initial amplicon composition to include one ormore components employed in a given NGS protocol, e.g., sequencingplatform adaptor constructs, indexing domains, clustering domains, etc.

By “sequencing platform adapter construct” is meant a nucleic acidconstruct that includes at least a portion of a nucleic acid domain(e.g., a sequencing platform adapter nucleic acid sequence) orcomplement thereof utilized by a sequencing platform of interest, suchas a sequencing platform provided by Illumina® (e.g., the NovaSeg™,NexSeg™, HiSeg™, MiSeg™ and/or Genome Analyzer™ sequencing systems);Thermo Fisher (e.g., Ion Torrent™ (such as the Ion PGM™ and/or IonProton™ sequencing systems) and Life Technologies™ (such as a SOLiDsequencing system)); Pacific Biosciences (e.g., the PACBIO RS IIsequencing system); Roche (e.g., the 454 GS FLX+ and/or GS Juniorsequencing systems); Oxford Nanopore technologies (e.g., MinION™,GridION™, PrometION™ sequencing systems) or any other sequencingplatform of interest.

In certain aspects, the sequencing platform adapter construct includes anucleic acid domain selected from: a domain (e.g., a “capture site” or“capture sequence”) that specifically binds to a surface-attachedsequencing platform oligonucleotide (e.g., the P5/i5 or P7/i7oligonucleotides attached to the surface of a flow cell in an Illumina®sequencing system); where the construct may include one or moreadditional domains, such as but not limited to: a sequencing primerbinding domain or clustering domain (e.g., a domain to which the Read 1or Read 2 primers of the Illumina® platform may bind); a indexing domain(e.g., a domain that uniquely identifies the sample source of thenucleic acid being sequenced to enable sample multiplexing by markingevery molecule from a given sample with a specific index or “tag”); abarcode sequencing primer binding domain (a domain to which a primerused for sequencing a barcode binds); a unique molecular identificationdomain (e.g., a molecular index tag, such as a randomized tag of 4, 6,or other number of nucleotides) for uniquely marking molecules ofinterest to determine expression levels based on the number of instancesa unique tag is sequenced; a complement of any such domains; or anycombination thereof. In certain aspects, a barcode domain (e.g., sampleindex tag) and a molecular identification domain (e.g., a molecularindex tag) may be included in the same nucleic acid.

The sequencing platform adapter constructs may include nucleic aciddomains (e.g., “sequencing adapters”) of any length and sequencesuitable for the sequencing platform of interest. In certain aspects,the nucleic acid domains are from 4 to 200 nucleotides in length. Forexample, the nucleic acid domains may be from 4 to 100 nucleotides inlength, such as from 6 to 75, from 8 to 50, or from 10 to 40 nucleotidesin length. According to certain embodiments, the sequencing platformadapter construct includes a nucleic acid domain that is from 2 to 8nucleotides in length, such as from 9 to 15, from 16-22, from 23-29, orfrom 30-36 nucleotides in length.

The nucleic acid domains may have a length and sequence that enables apolynucleotide (e.g., an oligonucleotide) employed by the sequencingplatform of interest to specifically bind to the nucleic acid domain,e.g., for solid phase amplification and/or sequencing by synthesis ofthe cDNA insert flanked by the nucleic acid domains. Example nucleicacid domains include the P5 (5′-AATGATACGGCGACCACCGA-3′) (SEQ ID NO:03),P7 (5′-CAAGCAGAAGACGGCATACGAGAT-3′)(SEQ ID NO:04), Read 1 primer(5′-ACACTCTTTCCCTACACGACGCTCTTCCGATCT-3′) (SEQ ID NO:05) and Read 2primer (5′-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT-3′) (SEQ ID NO:06) domainsemployed on the Illumina®-based sequencing platforms. Other examplenucleic acid domains include the A adapter(5′-CCATCTCATCCCTGCGTGTCTCCGACTCAG-3′)(SEQ ID NO:07) and P1 adapter(5′-CCTCTCTATGGGCAGTCGGTGAT-3′)(SEQ ID NO:08) domains employed on theIon Torrent™-based sequencing platforms.

The nucleotide sequences of nucleic acid domains useful for sequencingon a sequencing platform of interest may vary and/or change over time.Adapter sequences are typically provided by the manufacturer of thesequencing platform (e.g., in technical documents provided with thesequencing system and/or available on the manufacturer's website). Basedon such information, the sequence of the sequencing platform adapterconstruct of the template switch oligonucleotide (and optionally, afirst strand synthesis primer, amplification primers, and/or the like)may be designed to include all or a portion of one or more nucleic aciddomains in a configuration that enables sequencing the nucleic acidinsert (corresponding to the template nucleic acid) on the platform ofinterest.

The sequencing adaptors may be added to the amplicons of the initialamplicon composition using any convenient protocol, where suitableprotocols that may be employed include, but are not limited to:amplification protocols, ligation protocols, etc. In some instances,amplification protocols are employed. In such instances, the initialamplicon composition may be combined with forward and reverse sequencingadaptor primers that include one or more sequencing adaptor domains,e.g., as described above, as well as domains that bind to universalprimer sites found in all of the amplicons in the composition, e.g., theforward and reverse anchor domains, such as described above. As reviewedabove, amplification conditions may include the addition of forward andreverse sequencing adaptor primers configured to bind to the commonforward and reverse anchor domains and thereby amplify all or a desiredportion of the product nucleic acid, dNTPs, and a polymerase suitablefor effecting the amplification (e.g., a thermostable polymerase forpolymerase chain reaction), where examples of such conditions arefurther described above. The forward and reverse sequencing adaptorprimers employed in these embodiments may vary in length, ranging inlength in some instances from 20 to 60 nt, such as 25 to 50 nt. Additionof NGS sequencing adaptors results in the production of a compositionwhich is configured for sequencing by an NGS sequencing protocol, i.e.,an NGS library.

In certain aspects, the methods of the present disclosure furtherinclude subjecting the NGS library to NGS protocol, e.g., as describedabove. The NGS protocol will vary depending on the particular NGSsequencing system employed. Detailed protocols for sequencing an NGSlibrary, e.g., which may include further amplification (e.g.,solid-phase amplification), sequencing the amplicons, and analyzing thesequencing data are available from the manufacturer of the NGS systememployed. Protocols for performing next generation sequencing, includingmethods of processing the sequencing data, e.g., to count and tallysequences and assemble transcriptome data therefrom, are furtherdescribed in published United States Patent Application 20150344938, thedisclosure of which is herein incorporated by reference.

Pooling

Where desired, a given workflow may include a pooling step where aproduct composition, e.g., made up of synthesized first strand cDNAs orsynthesized double stranded cDNAs, is combined or pooled with productcompositions obtained from one or more additional samples, e.g., cells.The number of different product compositions produced from differentsamples, e.g., cells, that are combined or pooled in such embodimentsmay vary, where the number ranges in some instances from 2 to 50,000,such as 3 to 25,000, including 4 to 20,000 such as 5 to 10,000, where insome instances the number ranges from 100 to 10,000, such as 1,000 to5,000. Prior to or after pooling, the product composition(s) can beamplified, e.g., by polymerase chain reaction (PCR), such as describedabove.

Template Switch

In some embodiments, the primer reaction extension conditions using RNAtemplate could incorporate a template switching oligonucleotide, e.g.,with optional sample-specific barcode domain and anchor domain. Templateswitch is described in U.S. Pat. Nos. 5,962,271 and 5,962,272, as wellas Published PCT application Publication No. WO2015/027135; thedisclosures of which are herein incorporated by reference. Under theprimer extension conditions, the template switching oligonucleotidecould be employed by reverse transcriptase as a second template in theprimer extension reaction. As a result of this extension reaction, thesample-specific barcode sequences of the template switch oligonucleotidewill be incorporated to the 3′-end of the synthesized cDNA. In anotherembodiment, in addition to template switching oligonucleotide theextension reaction could also include a set of gene-specificoligonucleotides complementary to the target regions of RNA templates.This set of gene-specific oligonucleotides is designed as complementarysequences to the gene-specific portion of forward gene-specific primersand with the sequences or modification at the 3′-end which blockextension of these oligonucleotides by reverse transcriptase. Under thisdisclosed condition, the RNase H activity of reverse transcriptase (orRNAse H enzyme which could be added externally), will degrade the RNAtarget region complementary to the gene-specific oligonucleotide, thusgenerating target RNA template truncated at the sites selected fordesign of forward gene-specific primers. Using of target RNA templatesspecifically truncated at the sites of forward gene-specific primerswill allow to add at this site the sample-specific barcode encoded bytemplate switching oligonucleotide by extension of reverse gene-specificprimers with reverse transcriptase. Therefore, the extension reactioncomposition comprising target RNAs, barcoded template switchingoligonucleotides, oligonucleotides complementary to gene-specificportion of forward gene-specific primers and reverse gene-specificprimers allow to specifically extend and barcode the target ampliconregions in a single reaction step catalyzed by reverse transcriptase.

Utility

The subject methods find use in a variety of applications, includingexpression profiling or transcriptome determination applications, wherea sample is evaluated to obtain an expression profile of the sample. By“expression profile” is meant the expression level of a gene of interestin a sample, which may be a single cell or a combination of multiplecells (e.g., as determined by quantitating the level of an RNA orprotein encoded by the gene of interest), or a set of expression levelsof a plurality (e.g., 2 or more) of genes of interest. In certainaspects, the expression profile includes expression level data for 1, 2or more, 5 or more, 10 or more, 20 or more, 50 or more, 100 or more, 200or more, 300 or more, 400 or more, 500 or more, 1,000 or more, 5,000 ormore, 10,000 or more, 15,000 or more, e.g., 18,000 or more genes ofinterest. According to one embodiment, the expression profile includesexpression level data of from 50 to 8000 genes of interest, e.g., from1000 to 5000 genes of interest. In some embodiments, the expressionprofile includes expression level data of from 50 to 19,000 genes ofinterest, e.g., from 1000 to 18,000 genes of interest. In certainaspects, the methods may be employed detecting and/or quantitating theexpression of all or substantially all of the cancer associated genestranscribed in a target cell. In a preferred embodiment, the methods aredescribed for profiling all known cell and tissue marker genes, aslisted in Table 2. In certain aspects, the methods may be employeddetecting and/or quantitating the expression of all or substantially allof the genes transcribed by an organism, e.g., a mammal, such as a humanor mouse, in a target cell. The terms “expression” and “gene expression”include transcription and/or translation of nucleic acid material. Forexample, gene expression profiling may include detecting and/orquantitating one or more of any RNA species transcribed from the genomicDNA of the target cell, including pre-mRNAs, mRNAs, non-coding RNAs,microRNAs, small RNAs, regulatory RNAs, and any combination thereof.

Expression levels of an expressed sequence are optionally normalized byreference or comparison to the expression level(s) of one or morecontrol expressed genes, including but not limited to, ACTB, GAPDH,HPRT-1, RPL25, RPS30, and combinations thereof. These “normalizationgenes” have expression levels that are relatively constant among targetcells in the cellular sample.

In some instances, quantitative analysis of gene expression using set ofcalibration control template composition is performed. Internalcalibration control templates which mimic but differ from natural targetRNAs and spiked into cell or cell lysates at specific amount may beeffectively used for truly quantitative expression analysis. Thecalibration control RNAs could be developed for the set of genes (e.g.cell marker genes) or for genome-wide set of transcripts. In order toaddress the reproducibility of the profiling assay for multiplebiological samples (e.g. thousands of single cells), embodiments of theinvention uniquely employ the strategy of using barcoded reverse genespecific primers. Target template RNAs (e.g., present in cell extracts)hybridized with barcoded reverse gene specific primers could be combinedfor the all follow-up steps. The strategy of barcoding and combiningtarget RNAs at early (hybridization) stage allows for significantlyreduced cost of the assay, eliminates sample-to-sample profilingvariability due to differences in experimental assay conditions, etc.The developed protocol which addresses sample-to-sample and batch effectvariability has significant utility in biomarker discovery in clinicalsamples (e.g., whole blood).

According to certain embodiments, the expression profile includes“binary” or “qualitative” information regarding the expression of eachgene of interest in a target cell. That is, in such embodiments, foreach gene of interest, the expression profile only includes informationthat the gene is expressed or not expressed (e.g., above an establishedthreshold level) in the sample being analyzed, e.g., tissue, cell, etc.In other embodiments, the expression profile includes quantitativeinformation regarding the level of expression (e.g., based on rate oftranscription, rate of splicing and/or RNA abundance) of one or moregenes of interest. A qualitative and/or quantitative expression profilefrom the sample may be compared to, e.g., a comparable expressionprofile generated from other samples and/or one or more referenceprofiles from cells known to have a particular biological phenotype orcondition (e.g., a disease condition, such as a tumor cell; or treatmentcondition, such as a cell treated with an agent, e.g., a drug). When theprofiles being compared are quantitative expression profiles, thecomparison may include determining a fold-difference between one or moregenes in the expression profile of a target cell and the correspondinggenes in the expression profile(s) of one or more different target cellsin the cellular sample, or the corresponding genes in a reference cellor cellular sample. Alternatively, or additionally, the expressionprofile may include information regarding the relative expression levelsof different genes in a single target cell. In certain aspects, the folddifference in intercellular expression levels or intracellularexpression levels can be determined to be 0.1 or more, 0.5 fold or more,1 fold or more, 1.5 fold or more, 2 fold or more, 2.5 fold or more, 3fold or more, 4 fold or more, 5 fold or more, 6 fold or more, 7 fold ormore, 8 fold or more, 9 fold or more, or more than 10 fold or more, forexample.

In some instances, the methods may be employed to determine thetranscriptome of a sample. The term “transcriptome” is employed in itsconventional sense to refer to the set of all messenger RNA molecules inone cell or a population of cells. In some instances, a transcriptomeincludes the amount or concentration of each RNA molecule in addition tothe molecular identities. The methods described herein may be employedin detecting and/or quantitating the expression of all genes orsubstantially all genes of the transcriptome of an organism, e.g., amammalian organism, such as a human or a mouse, for a particular targetcell or a population of cells.

Expression profiles obtained using methods of the invention may beemployed in a variety of applications. For example, an expressionprofile may be indicative of the biological condition of the sample orhost from which the sample is obtained, including but not limited to adisease condition (e.g., a cancerous condition, metastatic potential, anepithelial mesenchymal transition (EMT) characteristic, and/or any otherdisease condition of interest), the condition of the cell in response totreatment with any physical action (e.g., heat shock, hypoxia, normoxia,hydrodynamic stress, radiation, and/or the like), the condition of thecell in response to treatment with chemical compounds (e.g., drugs,cytotoxic agents, nutrients, salts, and/or the like) or biologicalextracts or entities (e.g., viruses, bacteria, other cell types, growthfactors, biologics, and/or the like), and/or any other biologicalcondition of interest (e.g. immune response, senescence, inflammation,motility, and/or the like).

Embodiments of the invention find further application in tumormicroenvironment analysis applications. Transcriptome data obtained,e.g., as described above, may be employed to determine the cellularconstitution of a tumor sample, e.g., to provide an evaluation of thetypes of cells present in a tumor sample, such as infiltratinghematopoietic cells, tumor cells and bulk tissue cells. For example,transcriptome data may be employed to assess whether a tumor sample doesnot include infiltrating immune cells, including those of the adaptiveand/or innate immune system, such as but not limited to: T, B, naturalkiller, monocyte, granulocytes, neutrophils, basophils, platelets, andtheir myeloid and lymphoid progenitor cells, hematopoietic stem cells,and the like. Such information may be used, e.g., in therapydetermination applications, for example where the presence ofinfiltrating immune cells indicates that a patient will be responsive toimmunotherapy while the absence of infiltrating immune cells indicatesthat a patient will not be responsive to immunotherapy. As such, aspectsof the invention include methods of therapy determination, where apatient tumor sample is evaluated to assess the tumor microenvironment.Aspects of the invention may further include making a determination toemploy an immunotherapy protocol is made if the tumor microenvironmentincludes infiltrating tumor cells and a determination is made to employa non-immunotherapy treatment regimen if the tumor microenvironmentlacks infiltrating immune cells.

Methods as described here also find use in large-scale profiling ofsingle-cell phenotypes derived from model system (e.g., cultivatedcells, organoid cultures, 3D cultures, etc.), model organisms (e.g.,mice, rat, monkey, etc.) and clinical samples derived from normal orpathological conditions (e.g., blood, biopsy, sputum, saliva, etc.).Currently, there is a substantial need for comprehensivecharacterization of different cell types present in normal andpathological conditions. The disclosed methods and compositions providean improved technological platform for large-scale discovery of keycellular markers for developing novel diagnostic and prognostic tools.

Transcriptome data, e.g., produced as described above, also finds use inother non-clinical applications, such as predictive and prognosticbiomarker discovery applications, evaluation of cancer immunoeditingmechanism applications, drug target discovery, and the like.

Compositions

Aspects of the invention further include various compositions.Compositions of the invention may include, e.g., one or more of any ofthe reaction mixture components described above with respect to thesubject methods, where the reaction mixtures may be present in a nucleicacid amplification device, such as in a container of such a device. Forexample, the compositions may include one or more of a target nucleicacid template (e.g., genomic DNA sample, cDNA sample, RNA sample, etc.),individual cells or group of cells, a polymerase (e.g., a thermostablepolymerase), ligase, a set of gene specific primers, barcodedoligonucleotides (e.g., donor nucleic acids), primers for cDNAsynthesis, dNTPs, NAD, ATP, a salt, a metal cofactor, one or morenuclease inhibitors (e.g., an RNase inhibitor), one or moreenzyme-stabilizing components (e.g., DTT), or any other desired reactionmixture component(s). Composition may vary for the different steps ofthe disclosed methods. For example, for cDNA synthesis steps thecompositions may include only reagents necessary for reversetranscription and for the subsequent ligation step the composition mayemploy a different buffer, oligonucleotides and enzymes (DNA ligase)components. Some components of composition (e.g., barcodedoligonucleotides), may be immobilized on a solid surface (e.g., platewall, beads, etc.). Also provided are compositions that include a primerextension product composition, and barcoded primer extension productcomposition, e.g., as described above. Also provided are barcodedamplicon compositions and NGS libraries, such as described above.

In certain embodiments disclosed in the application, the differentcompositions are physically separated from each other, e.g., they arepresent or deposited in different wells or plate or in microdroplets.For example, compositions comprising a plurality of single barcodedoligonucleotides immobilized on bead and single cell sample could bepresent in each microdroplet. In other embodiments, the plurality ofbarcoded compositions (e.g., hybridization complexes between target RNAsand barcoded reverse gene specific primers or barcoded primer extensionproduct compositions) are mixed together and used as a mix of differentcompositions in all follow-up steps. In such instances, the differentcompositions may include the common components necessary to performhybridization, enzymatic modification, e.g., primer extension,purification, etc. steps and unique components which are specific foreach biological sample, e.g., individual cells, purified nucleic acidfrom each sample, and sample-specific barcoded oligonucleotide, asdescribed in a more details above.

The subject compositions may be present in any suitable environment.According to one embodiment, the compositions are present in reactiontubes (e.g., a 0.2 mL tube, a 0.5 mL tube, a 1.5 mL tube, or the like)or a well. In certain aspects, the compositions are present in two ormore (e.g., a plurality of) reaction tubes or wells (e.g., a plate, suchas a 96-well plate). The tubes and/or plates may be made of any suitablematerial, e.g., polypropylene, or the like. In certain aspects, thetubes and/or plates in which the composition is present provide forefficient heat transfer to the composition (e.g., when placed in a heatblock, water bath, thermocycler, and/or the like), so that thetemperature of the composition may be altered within a short period oftime, e.g., as necessary for a particular hybridization or enzymaticreaction to occur. According to certain embodiments, the composition ispresent in a thin-walled polypropylene tube, or a plate havingthin-walled polypropylene wells. Other suitable environments for thesubject compositions include, e.g., a microfluidic chip (e.g., a“lab-on-a-chip device”). The composition may be present in an instrumentconfigured to bring the composition to a desired temperature, e.g., atemperature-controlled water bath, heat block, or the like. Theinstrument configured to bring the composition to a desired temperaturemay be configured to bring the composition to a series of differentdesired temperatures, each for a suitable period of time (e.g., theinstrument may be a thermocycler).

In another embodiment, the different compositions are present ordelivered to microwells of microplates with well sizes dimensioned toaccommodate individual cells, where the dimensions may be configured toaccommodate on average no more than 2 cells, such as no more than 1cell. Examples of such wells are those found in the plates of theRhapsody instrument (Becton, Dickinson and Company), the ICELL8instrument (Takara Bio USA), etc. where such instruments employ plateshaving approximately 10,000 wells and a deposition protocol forindividual cells and single beads.

In another embodiment, the different compositions are present inmicrodroplets. For example, emulsion PCR may be employed. For emulsionPCR, an emulsion PCR reaction (e.g., in a droplet, droplet microreactor)is created with a “water in oil” mix to generate thousands or millionsof micron-sized aqueous compartments. Sources of nucleic acids (e.g.,cells, nucleic acid libraries, optionally coupled to solid supports,e.g., beads) are mixed in a limiting dilution prior to emulsification ordirectly into the emulsion mix. The combination of compartment size andlimiting dilution of the nucleic acid sources is used to generatecompartments containing, on average, just one source of nucleic acid(e.g., cell, or sample nucleic acid(s), such as cellular nucleicacid—e.g., RNA combined with a solid support, such that the nucleicacids may be stably associated with the solid support (e.g., bead)etc.). Depending on the size of the aqueous compartments generatedduring the emulsification step, up to 3×10⁹ individual amplificationreactions per μ can be conducted simultaneously in the same container,e.g., tube, well, or other suitable container. The average size of acompartment in an emulsion ranges from sub-micron in diameter to over100 microns, depending on the emulsification conditions. Protocols thatmay be employed include those that allow one to deliver individual cellswith unique barcoded beads and reagents necessary for reversetranscription step into separate microdroplets. Such microdroplettechnologies include the Chromium instrument (10× Genomics), the ddSeqinstrument (Bio-Rad), etc. Microdroplets that include compositions asdescribed above may also be generated and delivered to separatecompartments or to oil (to form water-oil droplets) using conventionaltechnologies, e.g., FACS, ink-jet deposition, etc.

Kits

Aspects of the present disclosure also include kits. The kits mayinclude, e.g., one or more of any of the reaction mixture componentsdescribed above with respect to the subject methods. For example, thekits may include one or more of: a set of gene specific primers,barcoded oligonucleotides (e.g., donor nucleic acids, e.g., immobilizedon the beads), a polymerase (e.g., a thermostable polymerase, a reversetranscriptase, or the like), ligase (e.g. DNA ligase), dNTPs, a salt, ametal cofactor, NAD, ATP, one or more nuclease inhibitors (e.g., anRNase inhibitor and/or a DNase inhibitor), one or more molecularcrowding agents (e.g., polyethylene glycol, or the like), one or moreenzyme-stabilizing components (e.g., DTT), or any other desired kitcomponent(s), such as solid supports, e.g., tubes, beads, microfluidicchips, etc.

Components of the kits may be present in separate containers, ormultiple components may be present in a single container. For example,the individual barcoded oligonucleotides could be provided pre-aliquotedin separate wells or attached/encapsulated with different beads, andmixture of all beads is provided as kit components. In certainembodiments, it may be convenient to provide the components in alyophilized form, so that they are ready to use and can be storedconveniently at room temperature.

In addition to the above-mentioned components, a subject kit may furtherinclude instructions for using the components of the kit, e.g., topractice the subject method. The instructions are generally recorded ona suitable recording medium. For example, the instructions may beprinted on a substrate, such as paper or plastic, etc. As such, theinstructions may be present in the kits as a package insert, in thelabeling of the container of the kit or components thereof (i.e.,associated with the packaging or subpackaging) etc. In otherembodiments, the instructions are present as an electronic storage datafile present on a suitable computer readable storage medium, e.g.CD-ROM, diskette, Hard Disk Drive (HDD), portable flash drive, etc. Inyet other embodiments, the actual instructions are not present in thekit, but means for obtaining the instructions from a remote source, e.g.via the internet, are provided. An example of this embodiment is a kitthat includes a web address where the instructions can be viewed and/orfrom which the instructions can be downloaded. As with the instructions,this means for obtaining the instructions is recorded on a suitablesubstrate.

The following examples are offered by way of illustration and not by wayof limitation.

EXPERIMENTAL I. Barcoding Via Reverse Gene Specific Primer Extension

The objective of this experiment is to demonstrate performance ofexpression profiling in sorted cells or cell mixes using a set of genespecific primers pairs, i.e., the 1.3K Cell-Tissue Marker primer set(1.3K hCTM) as detailed Table 2 using the protocol illustrated in FIG.1.

A. Hybridization of Gene-Specific Primers and Sample-Barcoded DonorNucleic Acid to mRNA Sample

Individual cells are isolated using a microfluidic cartridge or sortingtechnology. Each of the individual cells (or cell samples) are mixedwith TLC lysis buffer (Qiagen)(e.g. 5-μl for single cell), RevGSP-L1primers (1.3K human Cell Marker primer set, Table 2) and a set ofT30-Anc2-BC-UMI-L2 donor nucleic acids immobilized on beads (ChemGenes)(where each set of donor nucleic acids for each cell varies only by theUMI domain and each cell contacted with its own unique set of donornucleic acids that differs from any other set used with the any othercell by the BC domain). The donor nucleic acids and RevGSP-L1 Primersare hybridized to the cellular RNA at 50° C. for 30 min. The structuresof the donor nucleic acid and RevGSP primers employed in this step areprovided below:

RevGSP-L1 Primer: (SEQ ID NO: 09)                  L1 domain3′<RevGSP-AGCACCGACCAGCACCp-5′ T30-Anc2-UMI-BC-L2: (SEQ ID NO: 10)

An example of a T30-Anc2-UMI-BC-L2 donor nucleic acid showing anexemplary UM18 domain (unique molecular identifier of 8 residues inlength) and a BC14 domain (sample barcode of 14 residues in length) isprovided below:

(SEQ ID NO: 11)

Beads with immobilized mRNA, dT30-Anc2-BC-UMI and RevGSP-L2 primersproduced from different cells/samples are then pooled together andwashed with TCW buffer (Qiagen).

B. Ligation of Gene Specific Primers to Donor Nucleic Acids

The double-stranded complexes formed between polyA+ mRNA,T30-Anc2-UMI-BC-L2 donor nucleic acid and RevGSP-L1 Primer as shown inFIG. 1 are then contacted under hybridization conditions with acomplementary ligation linker oligonucleotide (L1-L2) having thefollowing sequence:

(SEQ ID NO: 12) A-TCGTGGCTGGTCGTGG--CGGTCGTGCGGTGGT-3′ dT                 L1-L2Specifically, beads with immobilized complex between polyA+ mRNA andT30-Anc2-UMI-BC-L2 and Rev-GSP-L1 as described above are mixed with10-μl of Multiplex Reverse Transcription master mix (DriverMap kit,Cellecta, Inc. Mountain View, Calif.), where the mix includes the L1-L2ligation linker (20 nM), Ampligase (Epicentre, 1/250 dilution) and NAD(1 mM) as well as all reagents required for reverse transcription. Theresultant reaction mixture is incubated at 50° C. for 30 min. The beadsare treated with 1-μl of ExoI (20 units/μl, New England Biolabs) at 37°C., 30 min, and washed with TCW buffer.

As a result, the L1 and L2 domains of the RT and RevGSP primers arebrought together as shown in FIG. 1 and also illustrated in thefollowing structure:

(SEQ ID NOs: 12, 13 and 14)

The Ampligase catalyzes ligation between the L1 and L2 domains resultingin the BC14, UM18 and Anc2 domains being transferred to the reverse genespecific primer, such that the reverse gene specific primer is labeledwith BC14-UMI8-Anc2 domains necessary for barcoding and amplification ofgene specific extension products as described in the following steps.First strand cDNA is also synthesized in this step, as illustrated inFIG. 1.C. Forward (FWD) GSP Extension-Second Strand cDNA Production

Resultant barcoded first strand cDNA products immobilized on beadsproduced in Step B, above, are subjected to a second round of extensionto produce second strand cDNA. For this step, Anchor 1-Fwd GSP primers(1.3K hCTM primers) are employed, having the structure:

Anchor 1 (SEQ ID NO: 15)

In this step, the barcoded cDNA products immobilized on beads producedin Step B, above, are combined with 10-μl of Extension master mix(DriverMap kit, Cellecta, Inc., Mountain View, Calif.) that includes 1nM of each Anchor 1-Fwd GSP primer and DNA polymerase at 64° C. for 30min, followed by treatment with 1-μl of ExoI (20 units/ul, New EnglandBiolabs) at 37° C. for 30 min. The beads are then washed with TCWbuffer.

D. 1st PCR

Anchored DNA fragments produced in Step C are amplified in 50-μl ofMultiplex DNA polymerase reaction mix (DriverMap kit) with universalanchor PCR primers (Fwd-Anc1 and Rev-Anc2) as shown below for 14-20cycles.

PCR primers for 1^(st )PCR Step: Fwd-Anc1 (SEQ ID NO: 16)

Rev-Anc2  (SEQ ID NO: 17)

This first round of PCR results in production of sample-barcodedanchor-domain-flanked double-stranded gene specific DNA fragments havingthe following structure:

(SEQ ID NOs: 18 and 19)

E. 2^(nd) PCR

2-μl aliquot of the 1st PCR amplicon product produced in Step D, above,is added to multiplex DNA polymerase reaction mix (50-0) and amplifiedusing P7-Anc1 and P5-Anc2 PCR primers with different indexes, as shownbelow:

2^(nd )PCR amplification primers: P7-Anc1 (SEQ ID NO: 20)

P5-Anc2 (SEQ ID NO: 21)

The reaction mixture is amplified for 7 cycles, treated with ExoI (0.5μl) at 37° C. for 30 min. The resultant adaptor containingsample-barcoded anchor-domain-flanked double-stranded gene specific DNAfragments have the following structure:

(SEQ ID NOs:22 and 23)

The resultant adaptor containing sample-barcoded anchor-domain-flankeddouble-stranded gene specific DNA fragments are analyzed by gel (ifnecessary, combined at equal amount based on the smear analysis (180-400bp range) for several samples), and purified using 1.8V of AMPpuremagnetic beads (Beckman Coulter). The purified cDNA products arequantitated by Qubit fluorescence measurement, and diluted to 10 nM (2ng/μl) for next-generation sequencing using NextSeq500 Illuminaplatform.

II. Variations of Example 1 Protocol

A. In a variation of the protocol described in Example I, only RevGSP-L1primers are hybridized with polyA+ mRNA in solution following which theresultant hybridization complexes are purified. The purifiedhybridization complexes are then combined T30-Anc2-UMI-BC-L2 donornucleic acids or a variant thereof lacking the T30 domain in thepresence of the L1-L2 ligation linker under ligation conditions to jointhe BC-UMI domains to the RevGSP-L1 primers.B. In a second variation of the protocol described in Example I,RevGSP-L1 primers are hybridized with polyA+ mRNA in conjunction witholigo dT immobilized on beads or test tubes following which theresultant hybridization complexes are purified. The purifiedhybridization complexes are then combined Anc2-UMI-BC-L2 in the presenceof the L1-L2 ligation linker under ligation conditions to join theBC-UMI domains to the RevGSP-L1 primers.

III. Barcoding Via Circular Intermediate

A. First Strand cDNA Synthesis.

50 ng of total RNA (human Brain RNA, Thermo-Fisher) is mixed with a setof T25-Anc1-UMI-BC-L2 donor nucleic acids that differ from each otherwith respect to the UMI domain at a 1 μM final concentration. Each ofthe T25-Anc1-UMI-BC-L2 donor nucleic acids is an oligo dT primerconfigured for cDNA synthesis, where the primers include a samplespecific barcode domain (BC14), universal molecular index domain (UM18),a ligation linker domain (L2), anchor domain (Anc1) and template bindingdomain (oligo dT25VN). The T25-Anc1-UMI-BC-L2 donor nucleic acids havethe following structure:

(SEQ ID NO: 24)

The resultant reaction mixture is treated at 72° C. for 2 min, cooled to4° C. and reverse transcribed in 1×RT reaction buffer using Maximareverse transcriptase (Thermo-Fisher) at 50° C. for 30 min in 10-μlreaction mix (Thermo-Fisher), following which RT is inactivated at 95°C. for 5 min. The resultant first strand cDNA product composition iscombined with a second first strand cDNA product composition preparedfrom 50 ng human universal RNA (Agilent Technologies) from a second setof T25-Anc1-UMI-BC-L2 RT primers having a BC domain different from thatused with the human Brain RNA. The resultant pooled samples are purifiedusing RNA/DNA micro isolation kit (Qiagen) using manufacturer protocoland eluted in 14-μl of water.

B. Forward Gene-Specific Primer Extension Second Strand cDNA Synthesisand Ligation Step.

14 μl of the first strand cDNA composition produced as described in StepA, above, is mixed with 11 μl of Multiplex DNA polymerase masterreaction mix (DriverMap kit, Cellecta) that includes the pool of L-1Forward 1.3K CTM GSP primers (Table 2) (final concentration 1 nM of eachprimer). The L-1 Forward gene specific primers (L-1 Fwd GSPs) have thefollowing structure:

L1-Fwd GSP (SEQ ID NO: 25)

Hybridized L1-Fwd GSPs are extended for 1 cycle at 64° C. extensiontemperature (30 min), treated with 14 of ExoI (New England BioLabs) for30 min at 37° C., and 95° C. for 5 min.

The resultant forward gene specific primer extension product composition(i.e., second strand cDNA composition) is then mixed with DNA ligationmaster mix (54) that includes DNA polymerase master mix with L1-L2ligation linker oligonucleoptide (20 nM) complementary to the both endsof the forward primer driven second strand cDNA and 1×buffer withAmpliTaq DNA ligase (10 units, Epicentre), 1 mM NAD (Sigma). The L1-L2ligation linker has the following structure:

(SEQ ID NO: 26)

 Combination of the forward gene specific primer extension product(second strand cDNA) composition with the L1-L2 linker results inproduction of a circular nucleic acid loop intermediate made up of thesecond strand cDNA whose ends are held together by the L1-L2 linker, asillustrated below:

(SEQ ID NOs: 26, 27 and 28)

The resultant reaction mixture is ligated for 5 cycles: (95° C. for 20sec, 65° C. for 1 min). Ligated circle products are purified by AMPpurebeads (1.8×volume, Beckman Coulter) according to manufacturer'sprotocol. In the course of ligation of the above circular intermediateby DNA ligase (between L1 and L2 linker domains), the universal primerbinding domain (Anc1), unique molecular index domain (UM18) and samplebarcode domain (BC14) are transferred to the forward gene specificdomain in the final single-stranded circular structure, as illustratedin FIG. 2.

C. Reverse Gene Specific Primer Extension Step

The ligated second strand circular cDNA composition produced is Step B,above, is then subjected to a reverse gene specific primer (RevGSP)round of extension with 5 μl of DNA polymerase master mix that includes1.3K CTM reverse gene specific primers which include a second anchordomain, i.e., Anc2. The Anc2-Rev GSPs have the following structure:

Anc2-Rev GSP (SEQ ID NO: 29)

For this step, the same conditions as employed for the first forwardgene-specific primer extension step are used, and the resultant productis treated with 1-μl of ExoI at 37° C. for 30 min, and 95° C. for 5 min.

D. First PCR

Anchored cDNA fragments produced in Step C, above, are then amplified in100-μl of Multiplex DNA polymerase reaction mix (DriverMap kit,Cellecta) with universal anchor PCR primers (Anc1 and Anc2) for 16 PCRcycles (98° C. for 10 sec, 72° C. for 20 sec). The universal anchor PCRprimers have the following structures:

Anc1 (SEQ ID NO: 30)

Anc2 (SEQ ID NO: 31)

 This first PCR results in the production of sample-barcodedanchor-domain-flanked double-stranded gene specific deoxyribonucleicacid (DNA) fragments having the structure:

Amplicon Structure after 1^(st )PCR step. (SEQ ID NOs:32 and 33)

E. Second PCR

A 5-μl aliquot of the first PCR product produced as described in Step D,above, is combined with Multiplex DNA polymerase reaction mix (100-μl)and amplified using forward and reverse PCR primers (P7-Anc1 andP5-Anc2) for 8 cycles. The forward and reverse PCR primers (P7-Anc1 andP5-Anc2) primers have the following structure:

P7-Anc1 (SEQ ID NO: 34)

P5-Anc2 (SEQ ID NO: 35)

The resultant amplicon composition is treated with ExoI (1-μl) at 37° C.for 30 min. This first PCR results in the production of sequencingadaptor containing sample-barcoded anchor-domain-flanked double-strandedgene specific deoxyribonucleic acid (DNA) fragments having thestructure:

Amplicon structure after second PCR step. (SEQ ID NOs: 36 and 37)

The resultant adaptor containing sample-barcoded anchor-domain-flankeddouble-stranded gene specific deoxyribonucleic acid (DNA) fragments areanalyzed by gel and combined at equal amount based on the smear analysis(220-480 bp range) and purified using AMPpure magnetic beads (1.8×volume, Beckman Coulter) according to the manufacturer's protocol.

The purified adaptor containing sample-barcoded anchor-domain-flankeddouble-stranded gene specific deoxyribonucleic acid (DNA) fragments arequantitated by OD260 measurement, and diluted to 10 nM (2.2 ng/μl) fornext-generation sequencing in NextSeq500 Illumina platform using thefollowing program: Read 1: RevSS-SeqDNA >34 cycles; Ind1:RevSS-SeqInd >14 cycles; Ind 2: FwdSS-SeqMB>8 cycles; Read 2:FwdSS-SeqDNA >34 cycles. The sequencing primers have the followingstructure:

FwdSS-SeqDNA (SEQ ID NO: 38) ACGACCGCCACGACCAGCCACGA FwdSS-SeqMB(SEQ ID NO: 39)

RevSS-SeqDNA (SEQ ID NO: 40) ACTACACACGAGCACCGACCAGCACAGA RevSS-SeqInd(SEQ ID NO: 41) TGGTCGTGGCGGTCGTGCGGTGGT

IV. Protocol Employing Sample Barcoded Reverse Gene Specific Primer A.Design of Barcoded Reverse Gene Specific Primers.

(SEQ ID NO: 42, 43, and 84)3′           L2     5′3′  L1                      Anchor 2       5′RevGSP-ACCGACCAGCACCp GCCAGCACGCCA-(Barcode)-AGACACGACCAGCCACGAGCA-X-Bead     A-TGGCTGGTCGTGG--CGGTCGTGCGGT-3′dT                    Link1s

Barcoded oligonucleotides with minimum structure linker 5′-Anchor2-Barcode-Linker L1-3′ are ligated to reverse gene specific primer set(RevGSP) with minimum structure 5′-phosphate-Linker L2-RevGSP-3′ usingcomplementary to linker L1 and linker L2 oligonucleotide Link1s and DNAligase under ligation conditions.

The DNA ligation reaction attaches barcoded anchor oligonucleotides toreverse gene specific primers. As a result of the ligation reaction, theset of reverse gene specific primers is labeled with specific barcode.The set of barcoded reverse gene specific primers is purified fromnon-ligated products and used in the disclosed primer extension assay.The same set of gene specific primers could be labeled with plurality ofdifferent barcodes using the same protocol. In another embodiment, thesame protocol could be used for barcoding set of forward gene specificprimers.

Barcode-Anchor oligonucleotides are attached to the solid surface (e.g.beads) through linker X (e.g. X could be a cleavable linker).Furthermore, the different binding moiety (e.g. antibodies) may beattached to the beads to provide binding of the antibody-bead-barcodedGSP complex to specific cell types through antigen-antibodyinteractions.

Importantly, each barcode could have a complex structure as described inthe application in more detail. These complex composite barcodes couldhave several domains, including but not limited to:

-   -   1) Sample barcode—specific sequence (usually from 8-14 nt)        attached to a set of gene-specific primers, to label all        extension products derived from target RNA sample.    -   2) UMI—complex random, semi-random (usually 8-12 nt), or set of        unique specific sequences which allow to label each molecule        used in disclosed primer extension assay with unique        sequence/barcode. UMI could be added to RevGSP-Linker 2 set        between RevGSP and linker L2.    -   3) Bead barcode—specific sequence (10-16 nt) unique for each        bead if gene-specific primers are attached to the beads. In some        embodiments, e.g. for single cell analysis applications (e.g. if        only one biological sample used in the assay) bead barcode could        be sample barcode.    -   4) Antibody barcode—specific sequence unique for each specific        antibody immobilized to the beads.        Linker L1, linker L2 and complementary Link1s could be designed        with variety of different sequences with minimum length of 4 nt        each.        Examples of Anchor2-Barcode-Linker 1 oligonucleotides used in        ligation reaction:        Barcodes are underlined

Anc2-BC1-L1 (SEQ ID NO: 44) ACGAGCACCGACCAGCACAGA GAACAAACACCGCACGACCGAnc2-BC2-L1 (SEQ ID NO: 45) ACGAGCACCGACCAGCACAGA GGCGAAACACCGCACGACCGAnc2-BC3-L1 (SEQ ID NO: 46) ACGAGCACCGACCAGCACAGA GCAAAAGGACCGCACGACCGExample of Bead-barcoded oligonucleotide conjugates (synthesized byChemgenes, Inc.) used in ligation reaction.In the diagram below: T25—is oligo dT (25 nt) moiety used for binding tothe beads and purification of hybridization complexes between target RNAand barcoded reverse gene specific primers; PClinker—photocleavablelinker, or SSlinker—bisulfite linker cleaved by sulfite ions (e.g. DTTtreatment) used for detachment of reverse barcoded gene specific primersfrom the beads; Anchor2—binding site for universal amplification primer;UMI—Unique molecular index; Barcode—sample-specific 6 nt barcode(underlined); Linker L2—sequence necessary for ligation of barcodes withgene specific primer set.

ChemB-T25-PC1-Anc2-BC-L2 (SEQ ID NO: 48)                                 Anchor 2   UMI    Barcode Linker L2Bead-linker-T25-PClinker-AGCACCGACCAGCACAGAVVNVVNVVCATCAGACCGCACGACCG-3′ChemB-T25-SS-Anc2-BC-L2 (SEQ ID NO: 50)                                 Anchor 2   UMI    Barcode Linker L2Bead-linker-T25-SSlinker-AGCACCGACCAGCACAGAVVNVVNVVCAGCATGACCGCACGACCG-3′Example of final barcoded reverse gene specific primer structureemployed in the assay

(SEQ ID NO: 51)3′           L2-L1                           Anchor2            5′RevGSP-ACCGACCAGCACCGCCAGGACGCCA-(Barcode)-AGACACGACCAGCCACGAGGAwherein, L2-L1 linker sequence generated by ligation of L1 and L2linkers, Barcode—complex barcode, as described in a more details above,Anchor2—universal primer binding site. A similar structure could begenerated for barcoded forward gene specific primer set and employed inthe disclosed assay:

(SEQ ID NO: 52)3′             L2-L1                           Anchor1         5′FwdGSP-ACCGACCAGCACCGCCAGGACGCCA-(Barcode)-ACAGACGACCAGCCACGACGAIn some embodiments, the barcoded reverse gene specific primercomposition could be synthesized by combinatorial (pool and split)chemical synthesis without DNA ligation step. In this embodiment, L2-L1linker will be missing in the final structure.

B. High-throughput Synthetic Template and Calibration Control Template

Oligonucleotide Synthesis

Oligonucleotide libraries consisting of complex mixtures ofoligonucleotides ranging in length from 150-250 base pairs weremanufactured by Agilent Technologies under contract. Oligonucleotidedesign comprises full-length (or truncated in the middle) sequences ofall amplicons flanked by gene specific primers, surrounding amplicon RNAsequences (for amplicons shorter than 200 bp) and two-point mutationsdownstream (4 nt) from the 3′-end of the each forward and reverseprimer. In some embodiments, the oligonucleotides comprise the sequenceof T7 promoter upstream of the forward primer binding site in order togenerate sense synthetic template RNA that mimic natural target RNAssurrounding amplicon domains. Oligonucleotides were synthesized inspatially distinct locations using standard phosphoramidite chemistry ona silylated 6.625×6 inch wafer using an automated tool designed byAgilent Technologies. The solid support used in synthesis was a flat,non-porous silane coated glass rather than a locally curved, poroussurface traditionally used. The coupling steps used inkjet-printingtechnologies to deliver the appropriate amount of activator andphosphoramidite monomer to specific spatial locations on the solidsupport under anhydrous conditions. Oxidation and detritylationreactions were performed in dedicated flowcells using novel mechanicaloperations and fluid management steps to eliminate the depurination sidereaction limiting synthesis of long oligonucleotides. After deprotectionand release, oligonucleotides were recovered and concentrated bylyophilization in 2 mL tubes. Each Oligo Library yields 10 pmol ofnucleic acid material equally divided among up to 55,000 user-defined,unique sequences. In another embodiment, the synthetic templates weresynthesized using conventional phosphoramidate chemistry and mixedtogether at approximately equal concentration by IDT and MWG-Operoncompanies.

C. Calibration Control RNA Template Synthesis by Gene Assembly

The calibration control genes which mimic natural genes were synthesizedby GeneScript Technologies using modified Gibson gene assembly protocol.In one embodiment, the calibration control genes comprising T7 promoter,full-length target mRNA sequences including amplicon domain with atleast 1-2 point mutation downstream of primer binding site, and polyA(approximately 50 nt) were synthesized and cloned in the GeneScriptvector, and clones were validated by Sanger sequencing. To generate aset of calibration control templates, plasmid DNA clones correspondingto the set of control genes were mixed together (or used separately),digested by NotI restriction enzyme at a site located downstream ofpolyA site. Furthermore, the linearized plasmids were used as templatesfor RNA synthesis using T7 RNA polymerase and the manufacturer'sprotocol (MonsterScript kit, Epicentre Technologies).

D. High-throughput Gene Specific Primer Validation

Multiplex PCR primers with cognate target sequences were screened enmasse. In some embodiments, the set of barcoded reverse gene specificprimers (with the structure shown above) was first hybridized to controlnatural or synthetic template RNAs. Furthermore, the hybrids betweentarget mRNA and barcoded reverse gene specific hybrids were combinedtogether, purified and used as mix in the follow-up primer extension andamplification steps. In preferred embodiments, the hybridization stepwas performed with RNA sample and barcoded reverse gene specific primersin solution (e.g. primers released from beads). As discussed in a moredetail above, the selection of primers with high hybridizationefficiency and stability of target mRNA-primer complexes is the criticalstep which defines the overall performance of the assay and cross-talkbetween different samples. Moreover, using the barcoded reverse primersin the first step of protocol allows all samples to be combined togetherand therefore allows scale-up of the assay for analysis ofhundreds-thousands of samples in single test tube format.

In another embodiment, the natural or synthetic template RNAs arereverse transcribed e.g., from random primer and synthesized cDNAs usedas templates for the extension step, using barcoded forward genespecific primers and follow-up amplification steps.

In both protocols, uniformity of amplification, including primerefficiency, primer specificity and dynamic range (minimum 100-fold) weredetermined from multiplex reaction kinetic data. In order to reliablymeasure expression of different genes, the panel of 15 different humanuniversal RNA from different commercial sources (Agilent, Clontech,BioChain, Qiagen, etc.) and synthetic template RNA were used astemplates for cDNA synthesis. Non-specific primer activities weremeasured by yield of non-targeted products from human universal RNAs andnegative control templates (human genomic DNA and mouse universal RNAs).The protocol for testing primer performance was repeated several timeswith sets of 3-5 PCR primer pairs per gene until the primers with highspecific and low non-specific activity were selected. Finally,functionally validated primers were selected as experimentally validatedprimers for use in sets of experimental validated gene specific primers.

E. Multiplex RT-PCR Assay 1. Design of Primers for Anchor Addition,First and Second PCR Steps

Design of Barcoded Forward and Barcoded Reverse gene specific primerswith anchor1 (Fwd-anchor1-GSP primers) and anchor2 (Rev-anchor2-GSPprimers) with 3′-extended suppression portions for primer extensionsteps and universal PCR primers (F-MP1GAC and R-MP2CAG) to amplifyanchored cDNA fragments by PCR.

Sequences that are underlined are the common PCR suppression portions,and those in italics and bold are unique sequences for forward orreverse primers, respectively. GSP is the gene-specific primer domain.The BC-Link is Barcode-Linker domain, which comprises the compositebarcode as described in more details above and could be present in onlyreverse (preferred embodiment), only in forward or in both reverse andforward primers (SEQ ID NOS: 53 to 56).

       F-MP1GAC AGCAGCACCGACCAGCAGAC   AGCACCGACCAGCAGACA(BC-Link)FwdGSP>     Fwd-Anc1-GSP                    cDNA            Rev-Anc2-GSP                                         <RevGSP(Link-BC)AGACACGACCAGCCACGA                                                         GACACGACCAGCCACGA GCA                                                               R-MP2CAGFor simplicity, the structures below show the design of primers andamplification products only for the preferred embodiment of usingbarcoded reverse and non-barcoded forward gene specific primer set:

(SEQ ID NOS: 57 to 60)        F-MP1GAC AGCAGCACCGACCAGCAGAC   AGCACCGACCAGCAGACA-FwdGSP>     Fwd-Anc1-GSP            cDNA           Rev-Anc2-GSP                                 <RevGSP(BC-Link)AGACACGACCAGCCACGA                                                  GAC ACGACCAGCCACGA GCA                                                               R-MP2CAG

The resultant structure of amplified cDNA products after the twosequential primer extension steps using Barcoded Rev-anchor2-GSPs andFwd-anchor1-GSPs and a first PCR step using universal F-MP1GAC andR-MP2CAG primers is shown below:

(SEQ ID NO: 61 and 62)                           (60-250 nt)AGCAGCACCGACCAGCAGACA-FwdGSP-cDNA-RevGSP-Link-BC-TCTGTGCTGGTCGGTGCTCGTTCGTCGTGGCTGGTCGTCTGT-FwdGSP-cDNA-RevGSP-Link-BC-AGACACGACCAGCCACGAGCA

The first PCR amplified cDNA products were then subjected to a secondround of PCR to add IIlumina P7, P5 sequencing adaptors. PCR primers forthe second PCR step comprise anchor 1 and anchor 2 binding domains,indexing (highlighted in red) domains (optional domains, can be used ifexperiment requires to combine the different samples together for NGSstep) and P5 or P7 sequences necessary for cluster formation in IlluminaNGS instrument, as illustrated below:

Set of Forward Indexing Primers for 2^(nd) PCR step: (SEQ ID NOS: 63-68) FP7-A1Ind-AAGCAGAAGACGGCATACGAGATATACGACAGCAGCAGCACCGACCAGCAG ACA F7-A1Ind-BAGCAGAAGACGGCATACGAGATACTGATGAGCAGCAGCACCGACCAGCAG ACA F7-A1Ind-CAGCAGAAGACGGCATACGAGATAGCATCAAGCAGCAGCACCGACCAGCAG ACA FP7-A1Ind-DAGCAGAAGACGGCATACGAGATAAGTCGTAGCAGCAGCACCGACCAGCAG ACA FP7-A1Ind-EAGCAGAAGACGGCATACGAGATATCGCATAGCAGCAGCACCGACCAGCAG ACA FP7-A1Ind-FAGCAGAAGACCGCATACGAGATACATAGCAGCAGCAGCACCGACCAGCAG ACASet of Reverse Indexing Primers for 2^(nd) PCR step: (SEQ ID NOS: 69-74) RP5-A2Ind-AACGGCGACCACCGAGATCTACACATACGACACGACGAGCACCGACCAGCA CAGA RP5-A2Ind-BACGGCGACCACCGAGATCTACACACTGATGACGACGAGCACCGACCAGCA CAGA RP5-A2Ind-CACGGCGACCACCGAGATCTACACAGCATCAACGACGAGCACCGACCAGCA CAGA RP5-A2Ind-DACGGCGACCACCGAGATCTACACAAGTCGTACGACGAGCACCGACCAGCA CAGA RP5-A2Ind-EACGGCGACCACCGAGATCTACACATCGCATACGACGAGCACCGACCAGCA CAGA RP5-A2Ind-FACGGCGACCACCGAGATCTACACACATAGCACGACGAGCACCGACCAGCA CAGASet of Forward and Reverse Non-indexing Primers for 2^(nd) PCR step: (SEQ ID NOS: 75-76) Fp7-A1AGCAGAAGACGGCATACGAGATAGCAGCAGCACCGACCAGCAGACA RP5-A2ACGGCGACCACCGAGATCTACACACGACGAGCACCGACCAGCACAGAAfter a second PCR step with Forward and Reverse indexing primers thefinal amplicon structure, flanked with P7 and P5 IIlumina's adaptorsequences and ready for NGS, is shown below:

(SEQ ID NOS: 77-78) P7(Ind)AGCAGCACCGACCAGCAGACA-FwdGSP-cDNA-RevGSP(LinkBC)TCTGTGCTGGTCGGTGCTCGT(Ind)P5P7(Ind)TCGTCGTGGCTGGTCGTCTGT-FwdGSP-cDNA-RevGSP(LinkBC)AGACACGACCAGCCACGAGCA(Ind)P5The sequences of primers for NGS sequencing (e.g. Illumina NextSeq500platform) of cDNA inserts, barcode domain and indexes are providedbelow:

SeqDRAlink-Rev (SEQ ID NO: 79) TGGCGTGCTGGCGGTGCTGGTCGGT SeqDNA-Fwd(SEQ ID NO: 80) AGCAGCAGCACCGACCAGCAGACA SeqBarcode-Fwd (SEQ ID NO: 81)ACCGACCAGCACCGCCAGCACGCCA Optional sequencing primers: SegIND-Fwd(SEQ ID NO: 82) TCTGTGCTGGTCGGTGCTCGTCGT SegIND-Rev (SEQ ID NO: 82)TGTCTGCTGGTCGGTGCTGCTGCT SeqDNA-Rev (SEQ ID NO: 83)ACGACGAGCACCGACCAGCACAGAAn example protocol for NGS sequencing of amplified cDNA products inNext Seq500 machine using 150-nt sequencing kit is shown below:Read 1: SeqDNAlink-Rev >81 cyclesInd 1: SeqIND-Rev >6 cyclesInd 2: SeqBarcode-Fwd >38 cyclesRead 2: SeqDNA-Fwd >35 cycles

The read number for SeqBarcode-Fwd primer could depend on the design ofthe specific barcode domain cassette. The read cycle number 38 wasselected for reading complex sample barcode domain with the structure:Antibody barcode(6)-Sample barcode(6)-Bead barcode(14)-UMI(12).

F. Protocol for Multiplex RT-PCR amplification of target genes forexpression profiling or mutation analysis starting from total RNA (1ng-50 ng) mixed with calibration control RNA templates and usingbarcoded forward gene specific primer set.

Step 1. Total RNAs (mixed with synthetic calibration control RNAtemplates in a separate wells) was converted to cDNA in 10-μl ofreaction mix using random primer (N6, 5 uM), 1×GC buffer, dNTP (500 uM)and Maxima Reverse Transcriptase (10 units, Thermo-Fisher) at 50° C. for30 min.

Step 2. cDNA was primed (adding universal anchors 1 and barcodes) usinga mix of Barcoded Forward-anchor1-GSP primers (5 nM final concentrationfor each primer) in 20-μl reaction mix comprising 1×GC buffer, dNTP (250uM) and Phusion II (4 units, Thermo-Fisher) for 1 cycles at (98° C. for1 min, 64° C. for 30 min).

Step 3. Barcoded cDNA products after first primer extension step werecombined together and purified using an equal volume of AMPure magneticbeads (Beckman-Coulter) using manufacturer's protocol. Eluted cDNA (20ul) was treated with exonuclease I (20 units, New England BioLabs) at37° C. for 30-min.

Step 4. The Barcoded DNA products generated in step 3, were furtherextended (add universal anchor 2) using mix of Reverse-ancho2-GSPs in25-μl reaction mix comprising 1×GC buffer, dNTP (250 uM) and Phusion II(5 units, Thermo-Fisher) for 1 cycles at (98° C. for 1 min, 64° C. for30 min) and treated with exonuclease I (20 units) at 37° C. for 30-min.

Step 5. 1^(st) PCR step. Whole volume (25-μl) of barcoded anchored cDNAfragments (from Step 4) were amplified in 75-μl reaction mix comprising1×HF Buffer, dNTP (200 uM), universal PCR primers F-MP1GAC and R-MP2CAGand Phusion II (15 units, Thermo-Fisher) for 8-20 cycles (50 ng-1 ug ofstarting RNA, respectively) at (98° C. for 10 sec, 72° C. for 20 sec).

Step 6. 2^(nd) PCR step. 5-μl aliquot of 1st PCR was amplified in 100-μlof PCR mix comprising 1×HF Buffer, dNTP (200 μM), indexed (specific forthe each of several samples) or non-indexed (only for one sample)forward and reverse PCR primers and Phusion II (20 units, Thermo-Fisher)for 7 cycles at (98° C. for 10 sec, 72° C. for 20 sec).

Step 7. The amplified PCR products were analyzed in 3.5% agarose-1×TAEgel to optimize the cycle number and finally digested with exonuclease I(20 units, New England Biolabs), incubated and 37° C. for 30 min,inactivated at 65° C. for 15 min and purified by Qia PCR column.Purified PCR products were quantitated by Qubit (Thermo-Fisher) and ifnecessary different samples were mixed together (at an equal amount),diluted to 10 nM and sequenced in NextSeq500 using Illumina paired-endprotocol and reagents for 150 cycles.

G. Protocol for Multiplex RT-PCR amplification of target genes forexpression profiling or mutation analysis in single cells using barcodedreverse gene specific primer set.

Step 1. Individual cells (5,000-10,000) were deposited by FACS in aseparate wells (or as a separate droplets in oil) or partitioned inmicrodroplets using a microfluidics instrument (MissionBio) togetherwith the barcoded reverse gene specific primer set immobilized on beadsthrough photocleavable linkers (ChemB-T25-PCI-Anc2-BC-L2, ChemGenes, seestructure above) in 1×TCL lysis-hybridization buffer (Qiagen) togetherwith calibration control RNA template set.

Step 2. Barcoded reverse gene specific primers were released from beadsby UV365 nm treatment (20 watts) for 5 minutes and hybridized withtarget RNA templates (present in lysates in a separate compartments) at60° C. for 30 min. The hybridized complexes between the target RNA andthe barcoded reverse gene specific primers were combined together (afterremoval of the oil phase for microdroplets) and purified bound to oligodT25-beads by washing the beads three times in 1×SSC buffer. Thepurified target RNA-Barcoded reverse gene specific primer complexes weretreated with thermosensitive exonuclease I (20 units, New EnglandBioLabs) in 20-μl of 1×GC buffer at 37° C. for 30-min, 50° C. for 5 min.In an alternative protocol, the hybridized complexes between the targetRNA and the barcoded reverse gene specific primers were combinedtogether (after removal of the oil phase for microdroplets), purifiedusing RNA/DNA micro kit (Qiagen) and treated with thermosensitiveexonuclease I.

Step 3. Reverse primer extension step. RNA was converted to cDNA frombarcoded reverse gene specific primers (hybridized to target RNA in Step2) in 40-μl of reaction mix comprising 1×GC buffer, dNTP (500 uM),ThermaStop-RT (80 units, ThermaGenix) and Maxima Reverse Transcriptase(400 units, Thermo-Fisher) at 55° C. for 30 min.

Step 4. Forward primer extension step. Barcoded cDNA (generated in Step3) was primed using a mix of Forward-anchor1-GSP primers (5 nM finalconcentration for the each primer) in 50-μl reaction mix comprising 1×GCbuffer, dNTP (250 uM) and Phusion II (10 units, Thermo-Fisher) for 1cycles at (98° C. for 1 min, 64° C. for 30 min) and treated withexonuclease I (20 units) at 37° C. for 30-min.

Step 5. 1^(st) PCR step. The whole volume (50-μl) of barcoded anchoredcDNA fragments (from Step 4) was amplified in 100-μl reaction mixcomprising 1×HF Buffer, dNTP (200 uM), universal PCR primers F-MP1GACand R-MP2CAG and Phusion II (20 units, Thermo-Fisher) for 14 cycles at(98° C. for 10 sec, 72° C. for 20 sec).

Step 6. 2^(nd) PCR step. A 5-μl aliquot of 1st PCR was amplified in100-μl of PCR mix comprising 1×HF Buffer, dNTP (200 μM), indexed(specific for the each of several samples) or non-indexed (only for onesample) Fwd and Rev PCR primers and Phusion II (20 units, Thermo-Fisher)for 7 cycles at (98° C. for 10 sec, 72° C. for 20 sec).

Step 7. The amplified PCR products were analyzed in 3.5% agarose-1×TAEgel to optimize the cycle number and finally digested with exonuclease I(20 units, New England Biolabs), incubated and 37° C. for 30 min,inactivated at 65° C. for 15 min and purified by AMPure beads (1.5×volume) using manufacturer's protocol (Beckman-Coulter). Purified PCRproducts were quantitated by Qubit (Thermo-Fisher) and if necessarydifferent samples were mixed together (at equal amount), diluted to 10nM and sequenced in NextSeq500 using Illumina paired-end protocol andreagents for 150 cycles.

H. Next Generation Sequencing Applications

Recently developed targeted approaches reduce NGS data complexity andgenerate qualitative sequencing information by measurement of a subsetof targets per technical replicate with minimal sample usage.Nonetheless, targeted approaches reported thus far have limited clinicalutility due to several scientific challenges, such as a prioridetermining which genetic markers have the most clinical significanceand identifying key genetic variants that are correlated with a specificdrug response. Furthermore, technical limitations due to skewed orinaccurate quantitative representation of clinical targets andinter-library variation confound their utility in the clinical setting.

For example, cancer is a complex multigenic disease characterized bydiverse genetic and epigenetic alterations. A comprehensive catalog ofall types of variants in cancer opens novel and unique opportunities forunderstanding the mechanism of cancer onset or progression andfacilitates a more personalized approach to clinical care, includingimproved risk stratification and treatment selection. Next-generationsequencing (NOS) is now a major driver in translational and geneticresearch, providing a powerful way to study DNA or RNA from clinicalspecimens. For example, transcriptome profiling can unambiguously definea unique gene expression signature for each tumor that may prove usefulfor both disease classification and prognosis. Unfortunately, both thecost and the complexity of whole genome DNA sequencing or transcriptomeRNA-sequencing data sets impede the use of these methodologies inroutine molecular diagnostic, testing.

Predesigned targeted gene panels disclosed in the current inventioncontain essential genes associated with human disease or phenotype(s),selected from publications, open access databases/resources, and expertcuration. By focusing on the genes most likely to be involved incellular processes and disease, these targeted RNA-Seq panels conservesequencing next-generation sequencing (NGS) resources and minimize dataanalysis considerations. Predesigned panels will be unique experimentaltools for clinical research on various diseases, such as cancer,signaling pathways, markers of cell lineage, differentiation, andactivation. Examples below illustrate several assays we developed forclinical research applications.

1. Cell Marker Panel Assay for Profiling Cell Composition

The human Cell Marker Assay is a targeted multiplex RT-PCR panel thatenables gene signature-based inference and quantitative evaluation ofmultiple unique immune and stromal cell types. The Cell Marker Assayprovides a cost-effective strategy for quantitative analysis of cellcomposition in a wide range of clinical samples based on analysis of allwell-characterized cell specific biomarkers.

The cell marker gene sets summarize and characterize gene signatures for64 distinct cell types, spanning multiple adaptive and innate immunitycells, hematopoietic progenitors, epithelial cells, and extracellularmatrix cells derived from thousands of published gene expressionsignatures. To generate our compendium of gene-specific signatures forhuman cell types, we used data from ENCODE, FANTOM, ImmGen, and theHuman Primary Cells Atlas (HPCA). Also, we collected gene expressionprofiles from the Blueprint project, from which we annotated 144 samplesfrom 28 cell types, and the IRIS project, from which we annotated 95samples from 13 cell types. We collected and curated gene expressionprofiles from ˜2,310 samples of pure cell types and annotated 64distinct cell types and cell subsets.

The Human Cell Marker 1.3K targeted panel measures the expression levelof 1,285 human protein-coding genes by combining highly multiplexedRT-PCR amplification with the depth and precision of NGS quantitation.The Cell Marker 1.3K panel also includes a set of 85 housekeeping geneswith constant expression between different cell types. The Cell Marker1.3K panel employs computationally-predicted set of PCR primers formultiplex PCR which are functionally (i.e., experimentally) validated,e.g., as described above. The unique multiplex primer design minimizesprimer dimerization and cross-reactivity while enhancing specificity ofhybridization and efficacy in primer extension steps. The set ofcalibration control RNA templates was developed for all housekeeping and1285 cell marker genes, and is mixed with sample template RNA to be usedas internal standards for calibration and QC of all RNA samples employedin the assay. It is an easy-to-run, one-tube assay that can be rundirectly from cell extract or total RNA (10 pg-50 ng) isolated fromcells, tissues, or blood. In this embodiment, the RNA templatecompositions from different samples are hybridized with set of barcodedreverse gene specific primers, the RNA-primer hybrids are combinedtogether in single-test tube, purified from non-bind primers, and usedfor follow-up primer extension and amplification steps. The multiplexsingle-tube assay provides robust, quantitative, and reproduciblemeasurements of each expressed gene in the set of biological samplesover as much as 5-orders of magnitude differences in expression level.

In another embodiment, the reverse gene specific primer set isimmobilized to barcoded beads (one specific barcode per bead) through acleavable linker. The barcoded bead-gene specific primer conjugates aremixed with the set of individual cells from a cell sample together withcalibration control RNAs. One bead-one cell compositions are thenisolated in separate compartments by FACS, aliquoting or microdroplettechnology. The hybridization of detached barcoded gene specific primerswith target mRNAs in solution, combining of all samples together,extension of purified of RNA-gene specific primer hybrids and follow upRT-PCR-NGS protocol allows a high level of multiplexing in analysis ofthousands of individual cells. Moreover, single cell analysis combinedwith calibration control templates allows quality control analysis ofeach individual cell, plus normalization and calibration of the singlecell data.

As to utility, the generated normalized quantitative single cell datacould be used to accurately profile the specific cell composition in awide range of clinical samples, including blood, tissue, biologicalfluids, organoids, isolated cells, organs, etc. in normal, treated, anddisease states. The comprehensive set of cell type specific markersincluded in the assay allows robust and cost-effective cell typinganalysis even at the scale of hundreds of thousands of individual cells.Moreover, Cell Marker 1.3K assay provides a cost-effective strategy forthe discovery of novel diagnostic and prognostic cell types and cellspecific biomarkers in xenograft, fine needle aspirate (FNA), biopsy,blood, and circulating tumor cell (CTC) clinical samples. Currently, anextended panel of cell markers based on the most recent public data arein development to address specific disease areas.

2. Blood Biomarker Diagnostic Panel

There is ample evidence that development of novel prognostic andpredictive biomarkers is a critical step for selecting patientspredisposed to respond to existing and novel (e.g. immunotherapy)treatments and their combinations. The Blood Biomarker 10K assay (BB10K)allows one to dissect immunological response mechanisms and discovernovel prognostic and predictive immune response gene signatures.

The BB10K assay was developed to quantitatively profile expression of10,000 key immunity genes expressed in different types of blood cells,using, in the preferred embodiment, single-cell multiplex RT-PCRamplification from total RNA followed by NGS sequencing. The individualcells (10K-100K) from peripheral blood mononuclear cell (PBMC) sampleswere mixed with cell type-specific antibody-bead-barcoded reverse genespecific primer conjugate and sorted by FACS or microfluidictechnologies in separate droplets (compartments) for cell-specificbarcoding of target mRNAs directly in cell lysates. Furthermore, eachspecific bead-antibody conjugate could be optically encoded tofacilitate isolation and analysis e.g. by FACS specific single cell-beadconjugates. No mRNA enrichment, beta-globin depletion, or otherprocessing are required. Up to 100 PBMC clinical samples could be runand combined together in the assay to provide a high level ofmultiplexing and significantly reduce sample-to-sample variation andbatch effect issues. Furthermore, the built-in internal calibrationstandards allow calibration and adjusting of digital HT sequencing datadepending on the level of intrinsic noise and quality of samples. TheBB10K assay provides quantitative expression data of immune-relatedgenes with 1,000-fold dynamic range and sensitivity down to 10 copies ofRNA per cell in whole cell lysate or cell fractions (e.g. nucleus) fromfrozen PBMC clinical samples. Up to 100 PBMC clinical samples could berun and combined together (after hybridization step) in the assay toprovide high level of multiplexing and significantly reducesample-to-sample variation and batch effect issues.

The BB10K panel includes more than 100 experimentally validated coregene signatures (e.g., based on immunity HallMark database, BroadInstitute) which correlate with a wide range of pathological conditions(cancer, cardiovascular, infection, acute pain, etc.), and predictefficacy of immunotherapy in several cancer types, including melanoma,colorectal, breast, and lung cancers after stimulation in vitro or invivo by drugs, physical treatment, biologics or chemical compounds (heatshock, LPS, bacterial antigens, etc.). Furthermore, the core signatureswere expanded by developing a computational functional interactionnetwork model to predict key nodes in pathways specific for antigenpresentation and recognition, inhibition and activation of immune cells.The BB10K panel also includes a set of TCR and BCR genes (variableregions) and housekeeping genes with constant expression betweendifferent blood cell types. The set of calibration control RNA templateswas developed for all housekeeping and 1000 different blood celltype-specific genes in order to use as internal standards forcalibration and QC of all cells employed in the assay.

Single cell expression profiling in all main blood cell types with theBB10K gene panel enables researchers to discover prognostic andpredictive immune response biomarker signatures. The predictivesignatures have the potential to stratify patients with a wide range ofclinical indications for responses to the growing number of therapeutictreatments.

Notwithstanding the appended claims, the disclosure is also defined bythe following clauses:

1. A method of preparing a plurality of sample-barcodedanchor-domain-flanked gene specific deoxyribonucleic acid (DNA)fragments from a template ribonucleic acid (RNA) sample, the methodcomprising:

employing a set of gene specific primer (GSP) pairs to produce theplurality of sample-barcoded anchor-domain-flanked gene specific DNAfragments from the template RNA sample, wherein each pair of genespecific primers (GSPs) is made up of an anchor domain comprisingforward primer and an anchor domain comprising reverse primer, at leastone of which comprises a sample barcode domain.

2. The method according to Clause 1, wherein each pair of the set ofGSPs is made up of a forward primer comprising an anchor domain and areverse primer comprising an anchor domain and a sample barcode domain.3. The method according to Clause 2, wherein the method comprisescontacting the template RNA sample with the reverse primers of the setto produce a hybrid composition comprising RNA/anchored sample barcodedreverse primer hybrids.4. The method according to Clause 2, wherein the reverse primers arelinked to a solid support.5. The method according to Clause 4, wherein the solid support is abead.6. The method according to Clauses 4 and 5, wherein the reverse primersare linked to the solid support by a cleavable linker.7. The method according to any of Clauses 3 to 6, wherein the methodcomprises removing unbound reverse primers from the hybrid compositionto produce a hybrid enriched composition.8. The method according to Clause 7, wherein the method comprisescontacting the hybrid enriched composition with the forward primers ofthe set under primer extension reaction conditions to produce theplurality of sample-barcoded anchor-domain-flanked gene specific DNAfragments from the template RNA.9. The method according to Clause 1, wherein the method comprisesemploying a sample-barcoded donor nucleic acid comprising an anchordomain and a sample barcode domain to produce the set of GSPs.10. The method according to Clause 9, wherein the sample-barcoded donornucleic acid comprises an RNA capture domain.11. The method according to Clause 10, wherein the sample-barcoded donornucleic acid further comprises a first linker, either the forwardprimers or the reverse primers comprise a second linker and the methodfurther comprises ligating the first and second linkers.12. The method according to any of Clauses 9 to 11, wherein thesample-barcoded donor nucleic acid comprises the structure: 3′-linker1-sample barcode domain-anchor 2 domain-RNA binding domain-5′.13. The method according to Clause 12, wherein the reverse primerscomprise the structure: 3′-reverse GSP domain-linker 2-5′; and theforward primers comprise the structure 5′-anchor 1-forward GSPdomain-3′.14. The method according to Clause 13, wherein the method comprises:

contacting the template RNA sample with the sample-barcoded donornucleic acid and the reverse primers under hybridization conditions;

ligating the linker 1 and linker 2 domains of hybrid sample-barcodeddonor nucleic acid and reverse primers to produce sample-barcodedreverse primers;

reverse transcribing first strand complementary DNA (cDNA) moleculesfrom the sample-barcoded reverse primers; and

contacting the first strand cDNA molecules with the forward primersunder polymerase mediated primer extension reaction conditions toproduce the plurality of sample-barcoded anchor-domain-flanked genespecific DNA fragments from the RNA sample.

15. The method according to any of Clauses 9 to 11, wherein thesample-barcoded donor nucleic acid comprises the structure: 3′-RNAbinding domain-anchor 1 domain-sample barcode domain-linker 1 domain-5′.16. The method according to Clause 15, wherein the forward primerscomprise the structure: 3′-forward GSP domain-linker 2-5′; and thereverse primers comprise the structure 5′-anchor 2-reverse GSPdomain-3′.17. The method according to Clause 16, wherein the method comprises:

contacting the template RNA sample with the sample-barcoded donornucleic acid under conditions sufficient to reverse transcribe firststrand cDNA molecules from the template RNA sample;

contacting the first strand cDNA molecules with the forward primersunder polymerase mediated primer extension reaction conditionssufficient to produce second strand cDNA molecules comprising a 5′linker 2 domain and a 3′ linker 1 domain;

ligating the 5′ linker 1 domain to the 3′ linker 2 domain to circularizethe second strand cDNA molecules; and

contacting the circularized second strand cDNA molecules with thereverse primers to produce the plurality of sample-barcodedanchor-domain-flanked gene specific DNA fragments from the template RNAsample.

18. The method according to any of the preceding clauses, wherein theforward and reverse primers of each primer pair comprise gene specific(GSP) domains that are experimentally validated as suitable for use in amultiplex amplification assay.19. The method according to any of the preceding clauses, wherein theforward and reverse primers of each primer pair are separated by atemplate distance of 60 to 300 nt.20. The method according to any of the preceding clauses, wherein thetemplate ribonucleic acid sample comprises messenger ribonucleic acids(mRNAs).21. The method according to any of the preceding clauses, wherein thetemplate RNA sample is obtained from a single cell.22. The method according to Clause 21, wherein the method comprisesobtaining the template RNA sample by isolating the single cell and thenlysing the isolated single cell to produce the RNA sample.23. The method according any of the preceding clauses, wherein the GSPdomain of each forward primer ranges in length from 18 to 25 nt.24. The method according any of the preceding clauses, wherein the GSPdomain of each reverse primer ranges in length from 30 to 70 nt.25. The method according to any of the preceding clauses, wherein theflanking anchor domains comprise a universal priming site and the methodfurther comprises amplifying the primer extension products comprisingthe anchor domains with universal forward and reverse primers havingsequences complementary to the universal priming sites underamplification conditions sufficient to produce a barcoded ampliconcomposition comprising multiple product amplicons.26. The method according to Clause 25, wherein the universal forward andreverse primers further comprise Next-Generation Sequencing (NGS)adaptor domains.27. The method according to Clause 26, wherein the method furthercomprises adding NGS adaptor domains to the multiple product ampliconsof the barcoded amplicon composition28. The method according Clause 27, wherein the NGS adaptor domains areadded to the multiple product amplicons of the barcoded ampliconcomposition via an amplification protocol.29. The method according to any of Clauses 25 to 28, wherein the methodfurther comprises sequencing the multiple product amplicons.30. The method according to Clause 29, wherein the multiple productamplicons are sequenced using an NGS protocol.31. The method according to any of the preceding clauses, wherein themethod is performed in a well.32. The method according to any of Clauses 1 to 30, wherein the methodis performed in a droplet.33. The method according to Clause 32, wherein the droplet is producedusing a microfluidics protocol.34. The method according to Clause 32, wherein the droplet is producingusing a fluorescence activated cell sorter (FACS) protocol.35. The method according to any of the preceding clauses, wherein themethod comprises a pooling step.36. The method according to any of the preceding clauses, wherein themethod comprises employing a calibration ribonucleic acid composition.37. A method of preparing a plurality of sample-barcodedanchor-domain-flanked gene specific DNA fragments from a template RNAsample, the method comprising:

contacting the template RNA sample with reverse primers of a set ofGSPs, wherein the reverse primers comprise an anchor domain and a samplebarcode domain, to produce a hybrid composition comprising RNA/anchoredsample barcoded reverse primer hybrids;

removing unbound reverse primers from the hybrid composition to producea hybrid enriched composition;

reverse transcribing the hybrid enriched composition to produce a cDNAcomposition; and

contacting the cDNA composition with forward primers of the set, whereinthe forward primers comprise an anchor domain, under primer extensionreaction conditions to produce the plurality of sample-barcodedanchor-domain-flanked gene specific DNA fragments from the template RNAsample.

38. The method according to Clause 37, wherein the reverse primersfurther comprise a UMI domain.39. The method according to Clause 37, wherein the reverse primers arelinked to a solid support.40. The method according to Clause 39, wherein the solid support is abead.41. The method according to Clauses 39 and 40, wherein the reverseprimers are linked to the solid support by a cleavable linker.42. The method according to Clause 41, wherein the method furthercomprises cleaving the cleavable linker.43. The method according to any of Clauses 39 to 42, wherein the solidsupport further comprises specific binding pair member.44. The method according to Clause 43, wherein the specific binding pairmember specifically binds to a cell surface marker.45. The method according to any of Clauses 39 to 44, wherein the reverseprimers further comprise a solid support barcode domain.46. The method according to any of Clauses 37 to 45, wherein thetemplate RNA sample is obtained from a single cell.47. The method according to Clause 46, wherein the method comprisesobtaining the template RNA sample by isolating the single cell and thenlysing the isolated single cell to produce the template RNA sample.48. The method according to Clause 47, wherein the cell is isolated andlysed in a well.49. The method according to Clause 47, wherein the cell is isolated andlysed in a droplet.50. The method according to Clause 49, wherein the droplet is producedusing a microfluidics protocol.51. The method according to Clause 49, wherein the droplet is producingusing a fluorescence activated cell sorter (FACS) protocol.52. The method according to any of Clauses 37 to 51, wherein the methodcomprises a pooling step.53. The method according to Clause 52, wherein the pooling stepcomprises pooling the enriched hybrid composition with at least oneadditional enriched hybrid composition produced form at least oneadditional template RNA sample.54. The method according to any of Clauses 37 to 53, wherein the methodcomprises employing a calibration ribonucleic acid composition.55. The method according to any of Clauses 37 to 54, wherein the methodfurther comprises removing non-extended forward primers from theplurality of sample-barcoded anchor-domain-flanked gene specific DNAfragments.56. The method according any of Clauses 37 to 55, wherein the GSP domainof each forward primer ranges in length from 18 to 25 nt.57. The method according any of Clauses 37 to 56, wherein the GSP domainof each reverse primer ranges in length from 30 to 70 nt.58. The method according to any of Clauses 37 to 57, wherein theflanking anchor domains comprise a universal priming site and the methodfurther comprises amplifying the primer extension products comprisingthe anchor domains with universal forward and reverse primers havingsequences complementary to the universal priming sites underamplification conditions sufficient to produce a barcoded ampliconcomposition comprising multiple product amplicons.59. The method according to Clause 58, wherein the universal forward andreverse primers further comprise Next-Generation Sequencing (NGS)adaptor domains.60. The method according to Clause 59, wherein the method furthercomprises adding NGS adaptor domains to the multiple product ampliconsof the barcoded amplicon composition61. The method according Clause 60, wherein the NGS adaptor domains areadded to the multiple product amplicons of the barcoded ampliconcomposition via an amplification protocol.62. The method according to any of Clauses 58 to 61, wherein the methodfurther comprises sequencing the multiple product amplicons.63. The method according to Clause 62, wherein the multiple productamplicons are sequenced using an NGS protocol.64. A system comprising:

nucleic acid amplification device;

a sample-barcoded donor nucleic acid comprising a RNA binding domain, ananchor domain and a sample barcode domain; and

a set of GSPs wherein each pair of GSPs is made up of a forward primerand a reverse primer.

65. The system according to Clause 64, wherein the sample-barcoded donornucleic acid further comprises a first linker and either the forwardprimers or the reverse primers comprise a second linker.66. The system according to Clause 65, wherein the sample-barcoded donornucleic acid comprises the structure: 3′-linker 1-sample barcodedomain-anchor 2 domain-RNA binding domain-5′.67. The system according to Clause 66, wherein the reverse primerscomprise the structure: 3′-reverse GSP domain-linker 2-5′; and theforward primers comprise the structure 5′-anchor 1-forward GSPdomain-3′.68. The system according to Clause 67, wherein the sample-barcoded donornucleic acid comprises the structure: 3′-RNA binding domain-anchor 1domain-sample barcode domain-linker 1 domain-5′.69. The system according to Clause 68, wherein the forward primerscomprise the structure: 3′-forward GSP domain-linker 2-5′; and thereverse primers comprise the structure 5′-anchor 2-reverse GSPdomain-3′.70. The system according to any of Clauses 64 to 69, wherein the forwardand reverse primers of each primer pair comprise GSP domains that areexperimentally validated as suitable for use in a multiplexamplification assay.71. The system according to any of Clauses 64 to 70, wherein the forwardand reverse primers are separated by a template distance of 60 to 300nt.72. The system according to any of Clauses 64 to 71, wherein the systemfurther comprises a RNA sample.73. The system according to Clause 72, wherein the RNA sample is from asingle cell.74. The system according to any of Clauses 64 to 73, wherein the deviceis a thermal cycler.75. A system comprising:

nucleic acid amplification device; and

a set of GSPs wherein each pair of GSPs is made up of a forward primerand a reverse primer comprising a sample barcode domain.

76. The system according to Clause 75, wherein the forward and reverseprimers of each primer pair comprise GSP domains that are experimentallyvalidated as suitable for use in a multiplex amplification assay.77. The system according to any of Clauses 75 to 76, wherein the forwardand reverse primers are separated by a template distance of 60 to 300nt.78. The system according to any of Clauses 75 to 77, wherein the reverseprimers are linked to a solid support.79. The system according to Clause 78, wherein the solid support is abead.80. The system according to any of Clauses 78 to 79, wherein the reverseprimers are linked to the solid support by a cleavable linker.81. The system according to any of Clauses 78 to 80, wherein the solidsupport further comprises specific binding pair member.82. The system according to Clause 81, wherein the specific binding pairmember specifically binds to a cell surface marker.83. The system according to any of Clauses 78 to 82, wherein reverseprimers further comprise a solid support barcode domain.84. The system according to any of Clauses 75 to 83, wherein the systemfurther comprises a RNA sample.85. The system according to Clause 84, wherein the RNA sample is from asingle cell.86. The system according to any of Clauses 75 to 85, wherein the deviceis a thermal cycler.

In at least some of the previously described embodiments, one or moreelements used in an embodiment can interchangeably be used in anotherembodiment unless such a replacement is not technically feasible. Itwill be appreciated by those skilled in the art that various otheromissions, additions and modifications may be made to the methods andstructures described above without departing from the scope of theclaimed subject matter. All such modifications and changes are intendedto fall within the scope of the subject matter, as defined by theappended claims.

It will be understood by those within the art that, in general, termsused herein, and especially in the appended claims (e.g., bodies of theappended claims) are generally intended as “open” terms (e.g., the term“including” should be interpreted as “including but not limited to,” theterm “having” should be interpreted as “having at least,” the term“includes” should be interpreted as “includes but is not limited to,”etc.). It will be further understood by those within the art that if aspecific number of an introduced claim recitation is intended, such anintent will be explicitly recited in the claim, and in the absence ofsuch recitation no such intent is present. For example, as an aid tounderstanding, the following appended claims may contain usage of theintroductory phrases “at least one” and “one or more” to introduce claimrecitations. However, the use of such phrases should not be construed toimply that the introduction of a claim recitation by the indefinitearticles “a” or “an” limits any particular claim containing suchintroduced claim recitation to embodiments containing only one suchrecitation, even when the same claim includes the introductory phrases“one or more” or “at least one” and indefinite articles such as “a” or“an” (e.g., “a” and/or “an” should be interpreted to mean “at least one”or “one or more”); the same holds true for the use of definite articlesused to introduce claim recitations. In addition, even if a specificnumber of an introduced claim recitation is explicitly recited, thoseskilled in the art will recognize that such recitation should beinterpreted to mean at least the recited number (e.g., the barerecitation of “two recitations,” without other modifiers, means at leasttwo recitations, or two or more recitations). Furthermore, in thoseinstances where a convention analogous to “at least one of A, B, and C,etc.” is used, in general such a construction is intended in the senseone having skill in the art would understand the convention (e.g., “asystem having at least one of A, B, and C” would include but not belimited to systems that have A alone, B alone, C alone, A and Btogether, A and C together, B and C together, and/or A, B, and Ctogether, etc.). In those instances where a convention analogous to “atleast one of A, B, or C, etc.” is used, in general such a constructionis intended in the sense one having skill in the art would understandthe convention (e.g., “a system having at least one of A, B, or C” wouldinclude but not be limited to systems that have A alone, B alone, Calone, A and B together, A and C together, B and C together, and/or A,B, and C together, etc.). It will be further understood by those withinthe art that virtually any disjunctive word and/or phrase presenting twoor more alternative terms, whether in the description, claims, ordrawings, should be understood to contemplate the possibilities ofincluding one of the terms, either of the terms, or both terms. Forexample, the phrase “A or B” will be understood to include thepossibilities of “A” or “B” or “A and B.”

In addition, where features or aspects of the disclosure are describedin terms of Markush groups, those skilled in the art will recognize thatthe disclosure is also thereby described in terms of any individualmember or subgroup of members of the Markush group.

As will be understood by one skilled in the art, for any and allpurposes, such as in terms of providing a written description, allranges disclosed herein also encompass any and all possible sub-rangesand combinations of sub-ranges thereof. Any listed range can be easilyrecognized as sufficiently describing and enabling the same range beingbroken down into at least equal halves, thirds, quarters, fifths,tenths, etc. As a non-limiting example, each range discussed herein canbe readily broken down into a lower third, middle third and upper third,etc. As will also be understood by one skilled in the art all languagesuch as “up to,” “at least,” “greater than,” “less than,” and the likeinclude the number recited and refer to ranges which can be subsequentlybroken down into sub-ranges as discussed above. Finally, as will beunderstood by one skilled in the art, a range includes each individualmember. Thus, for example, a group having 1-3 articles refers to groupshaving 1, 2, or 3 articles. Similarly, a group having 1-5 articlesrefers to groups having 1, 2, 3, 4, or 5 articles, and so forth.

Although the foregoing invention has been described in some detail byway of illustration and example for purposes of clarity ofunderstanding, it is readily apparent to those of ordinary skill in theart in light of the teachings of this invention that certain changes andmodifications may be made thereto without departing from the spirit orscope of the appended claims.

Accordingly, the preceding merely illustrates the principles of theinvention. It will be appreciated that those skilled in the art will beable to devise various arrangements which, although not explicitlydescribed or shown herein, embody the principles of the invention andare included within its spirit and scope. Furthermore, all examples andconditional language recited herein are principally intended to aid thereader in understanding the principles of the invention and the conceptscontributed by the inventors to furthering the art, and are to beconstrued as being without limitation to such specifically recitedexamples and conditions. Moreover, all statements herein recitingprinciples, aspects, and embodiments of the invention as well asspecific examples thereof, are intended to encompass both structural andfunctional equivalents thereof. Additionally, it is intended that suchequivalents include both currently known equivalents and equivalentsdeveloped in the future, i.e., any elements developed that perform thesame function, regardless of structure. Moreover, nothing disclosedherein is intended to be dedicated to the public regardless of whethersuch disclosure is explicitly recited in the claims.

The scope of the present invention, therefore, is not intended to belimited to the exemplary embodiments shown and described herein. Rather,the scope and spirit of present invention is embodied by the appendedclaims. In the claims, 35 U.S.C. § 112(f) or 35 U.S.C. § 112(6) isexpressly defined as being invoked for a limitation in the claim onlywhen the exact phrase “means for” or the exact phrase “step for” isrecited at the beginning of such limitation in the claim; if such exactphrase is not used in a limitation in the claim, then 35 U.S.C. § 112(f) or 35 U.S.C. § 112(6) is not invoked.

1-63. (canceled) 64-74. (canceled)
 75. A system comprising: nucleic acidamplification device; and a set of GSPs wherein each pair of GSPs ismade up of a forward primer and a reverse primer comprising a samplebarcode domain.
 76. The system according to claim 75, wherein theforward and reverse primers of each primer pair comprise GSP domainsthat are experimentally validated as suitable for use in a multiplexamplification assay.
 77. The system according to claim 75, wherein theforward and reverse primers are separated by a template distance of 60to 300 nt.
 78. The system according to claim 75, wherein the reverseprimers are linked to a solid support.
 79. The system according to claim78, wherein the solid support is a bead.
 80. The system according toclaim 78, wherein the reverse primers are linked to the solid support bya cleavable linker.
 81. The system according to claim 78, wherein thesolid support further comprises specific binding pair member.
 82. Thesystem according to claim 81, wherein the specific binding pair memberspecifically binds to a cell surface marker.
 83. The system according toclaim 78, wherein reverse primers further comprise a solid supportbarcode domain.
 84. The system according to claim 78, wherein the systemfurther comprises a RNA sample.
 85. The system according to claim 84,wherein the RNA sample is from a single cell.
 86. The system accordingto claim 78, wherein the device is a thermal cycler.
 87. A kitcomprising: a set of GSPs wherein each pair of GSPs is made up of aforward primer and a reverse primer at least one of which comprises asample barcode domain; and a multi-well plate.
 88. A kit according toclaim 87, wherein the reverse primers comprise a sample barcode domain.89. The kit according to claim 87, wherein the forward and reverseprimers of each primer pair comprise GSP domains that are experimentallyvalidated as suitable for use in a multiplex amplification assay. 90.The kit according to claim 87, wherein the reverse primers are linked toa solid support.
 91. The kit according to claim 90, wherein the solidsupport is a bead.
 92. The kit according to claim 91, wherein thereverse primers are linked to the solid support by a cleavable linker.93. The kit according to claim 87, wherein the multi-well platecomprises 96 or more wells.
 94. The kit according to claim 93, whereinthe multi-well plate comprises 384 or more wells.
 95. The kit accordingto claim 94, wherein the multi-well plate comprises 2000 or more wells.96. The kit according to claim 87, wherein the kit further comprises areverse transcriptase.
 97. The kit according to claim 87, wherein thekit further comprises a DNA polymerase.
 98. The kit according to claim97, wherein the kit further comprises forward and reverse anchorprimers.
 99. The kit according to claim 87, wherein the kit furthercomprises a nuclease.
 100. The kit according to claim 87, wherein thekit further comprises dNTPs.
 101. The kit according to claim 87, whereinthe kit further comprises a control.